CN116701116A

CN116701116A - Server fault prediction method and device, server and storage medium

Info

Publication number: CN116701116A
Application number: CN202310673523.1A
Authority: CN
Inventors: 杨虎; 耿志成; 郭锋
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-09-05

Abstract

The invention relates to the technical field of server development, and discloses a server fault prediction method, a device, a server and a storage medium. The server fault prediction method comprises the following steps: acquiring a log packet generated when a server runs, wherein the log comprises a plurality of pieces of log data; keyword matching is carried out on each log data and a plurality of preset target keyword sets respectively, at least one target fault type corresponding to the server is determined, and different target keyword sets correspond to different fault types; and obtaining a fault prediction result of the server according to at least one target fault type. The possible fault types of the server in the running process can be rapidly predicted by means of keyword detection of the log data, so that the efficiency of targeted maintenance of the server is improved. Moreover, the log data generated by the server can be fully utilized, and the defect that operation and maintenance personnel cannot directly maintain the server according to the log data is overcome, so that the applicability of the log data is improved.

Description

Server fault prediction method and device, server and storage medium

Technical Field

The invention relates to the technical field of server development, in particular to a server fault prediction method, a device, a server and a storage medium.

Background

In the related art, operation and maintenance personnel conduct item-by-item investigation on fault reasons by combining own experience under the condition that the server is determined to have faults until the fault reasons of the server are clear, and then targeted maintenance is conducted on the server.

However, the method is used for maintaining the server, a large amount of time and cost are required to be consumed, the maintenance efficiency is low, and the technical requirements on operation and maintenance personnel are high. Therefore, a method capable of improving maintenance efficiency is demanded.

Disclosure of Invention

In view of the above, the present invention provides a server failure prediction method, device, server and storage medium, so as to solve the problem of low maintenance efficiency of the server.

In a first aspect, the present invention provides a server failure prediction method, including:

acquiring a log packet generated when a server runs, wherein the log comprises a plurality of pieces of log data;

keyword matching is carried out on each log data and a plurality of preset target keyword sets respectively, at least one target fault type corresponding to the server is determined, and different target keyword sets correspond to different fault types;

And obtaining a fault prediction result of the server according to at least one target fault type.

In the method, the possible fault types of the server in the running process can be rapidly predicted by detecting the keywords of the log data, so that the efficiency of targeted maintenance of the server is improved. Moreover, the log data generated by the server can be fully utilized, and the defect that operation and maintenance personnel cannot directly maintain the server according to the log data is overcome, so that the applicability of the log data is improved.

In an optional implementation manner, keyword matching is performed on each log data and a plurality of preset target keyword sets respectively, and at least one target fault type corresponding to the server is determined, including:

determining a hardware module corresponding to the log data and determining a word set corresponding to the log data;

determining a plurality of associated keyword sets corresponding to the hardware modules from the plurality of target keyword sets;

respectively matching the word sets with each associated keyword set, and determining a target associated keyword set, wherein the target associated keyword set is the associated keyword set with the largest number of words in the word sets;

And determining the fault type corresponding to the target associated keyword set as the target fault type.

In an alternative embodiment, obtaining a failure prediction result of the server according to at least one target failure type includes:

if only one target fault type is determined, the target fault type is used as a fault prediction result of the server;

if the target fault types are determined to be multiple, taking the target fault type with the largest number as a fault prediction result of the server based on the number of each target fault type;

and if the number of the target fault types is the same, taking the multiple target fault types as a fault prediction result of the server.

In an alternative embodiment, the method further comprises:

and determining a target diagnosis scheme corresponding to the fault prediction result based on the preset correspondence between the fault types and the diagnosis schemes.

In an alternative embodiment, the method further comprises:

acquiring a plurality of log packet samples, wherein the log packet samples comprise a plurality of pieces of problem log data;

based on the number of the problem log data in each log packet sample and the attribute information of each problem log data, clustering a plurality of pieces of log data in each log packet sample to obtain a plurality of middle clustering clusters corresponding to each log packet sample;

Aggregating a plurality of middle cluster clusters corresponding to every two log packet samples based on a cluster distance threshold value to generate a plurality of target cluster clusters;

and respectively extracting target keywords in each target cluster to obtain a target keyword set corresponding to each target cluster.

In an alternative embodiment, the method further comprises:

respectively obtaining fault types corresponding to the log packet samples, and associating the fault types with a plurality of pieces of problem log data included in the corresponding log packet samples;

and determining the fault type corresponding to the target cluster based on the fault type corresponding to each problem log data in the target cluster.

In an alternative embodiment, the log package sample includes a plurality of issue log files; the attribute information comprises word sets corresponding to the problem log data and hardware modules corresponding to the problem log data;

based on the number of the problem log data in each log packet sample and the attribute information of each problem log data, clustering the plurality of log data in each log packet sample to obtain a plurality of middle clustering clusters corresponding to each log packet sample, wherein the clustering process comprises the following steps:

based on word sets of all the problem log data in the problem log file, the repetition number of all the problem log data in the problem log file and hardware modules corresponding to all the problem log data, clustering all the problem log data in the problem log file to obtain a plurality of initial clustering clusters;

And merging the plurality of initial clusters corresponding to each two problem log files according to the number of target clusters to generate intermediate clusters of the number of target clusters.

In an alternative embodiment, based on a word set of each problem log data in the problem log file, a repetition number of each problem log data in the problem log file, and a hardware module corresponding to each problem log data, clustering each problem log data in the problem log file to obtain a plurality of initial cluster clusters, including:

according to the hardware modules corresponding to each problem log data, a plurality of problem log data sets are obtained, and different problem log data sets correspond to different hardware modules;

counting the data quantity of the problem log data in each problem log data set;

based on word sets of the problem log data, the repetition number of the problem log data in the corresponding problem log data set and the data number of the corresponding problem log data set, clustering the problem log data in the problem log data set to obtain a plurality of initial clustering clusters.

In an alternative embodiment, clustering each of the issue log data in each of the issue log data sets based on a word set of each of the issue log data, a repetition number of each of the issue log data in a corresponding one of the issue log data sets, and a data number of the corresponding one of the issue log data sets, to obtain a plurality of initial cluster clusters, including:

Carrying out full-line matching on the current problem log data and other problem log data in the problem log data set respectively to obtain a first statistical value, wherein the first statistical value is the data quantity which is completely overlapped with the current problem log data in the other problem log data;

word matching is carried out on word sets of the current problem log data and word sets of other problem log data in the problem log data respectively, so that second statistical values are obtained, and the second statistical values are the number of word sets, in which words identical to the word sets exist, in each word set;

determining the total number of words of the problem log data set based on the word set of each problem log data;

based on the first statistic, the data quantity, the second statistic and the total number of words, clustering each problem log data in the problem log data set to obtain a plurality of initial clustering clusters.

In a second aspect, the present invention provides a server failure prediction apparatus, including:

the first acquisition module is used for acquiring a log packet generated when the server runs, and the log comprises a plurality of pieces of log data;

the matching module is used for matching keywords of each log data with a plurality of preset target keyword sets respectively, determining at least one target fault type corresponding to the server, and enabling different target keyword sets to correspond to different fault types;

And the first processing module is used for obtaining a fault prediction result of the server according to at least one target fault type.

In a third aspect, the present invention provides a server comprising: the server fault prediction method comprises the steps of storing computer instructions in a memory, and executing the computer instructions by the processor, wherein the memory and the processor are in communication connection, and the processor executes the server fault prediction method according to the first aspect or any corresponding implementation mode.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the server failure prediction method of the first aspect or any one of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a server failure prediction method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another server failure prediction method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of generating a target keyword set according to an embodiment of the present invention;

FIG. 4 is a flow chart of yet another server failure prediction method according to an embodiment of the present invention;

FIG. 5 is a block diagram of a server failure prediction apparatus according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hardware module structure of a server according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In view of this, in the server failure prediction method provided by the present invention, it includes: acquiring a log packet generated when the server runs, wherein the log comprises a plurality of pieces of log data; keyword matching is carried out on each log data and a plurality of preset target keyword sets respectively, at least one target fault type corresponding to the server is determined, and different target keyword sets correspond to different fault types; and obtaining a fault prediction result of the server according to the at least one target fault type. By the server fault prediction method, the possible fault types of the server in the running process can be rapidly predicted by the keyword detection mode of the log data, so that the efficiency of targeted maintenance of the server is improved.

Further, by adopting the server fault prediction method provided by the invention, the log data generated by the server can be fully utilized, and the defect that operation and maintenance personnel cannot directly maintain the server according to the log data is overcome, so that the applicability of the log data is improved.

According to an embodiment of the present invention, there is provided a server failure prediction method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a server failure prediction method is provided, which may be used in the server described above, and fig. 1 is a flowchart of a server failure prediction method according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:

step S101, acquiring a log packet generated when the server runs.

In the embodiment of the invention, the log comprises a plurality of pieces of log data, and the state of the current server in running can be clarified through the plurality of pieces of log data.

Step S102, keyword matching is carried out on each log data and a plurality of preset target keyword sets respectively, and at least one target fault type corresponding to the server is determined.

In the embodiment of the invention, different target keyword sets correspond to different fault types. The target keyword set comprises a plurality of target keywords, the target keywords can be understood as high-frequency keywords which appear in a plurality of pieces of log data under the condition of corresponding target fault types, and then the target keyword set can be used as priori knowledge, so that whether the log data are normal data can be determined through whether the target keywords exist in each piece of log data or not in the following prediction process.

Therefore, in order to detect whether the server has abnormality in the running process, each log data is matched with a preset plurality of target keyword sets in keywords respectively. Because different target keyword sets correspond to different target fault types, the plurality of log data in the log package can express the running condition of each hardware module in the running process of the server, and therefore, the determined target fault type corresponding to the server can be at least one.

Step S103, obtaining a fault prediction result of the server according to at least one target fault type.

According to the server fault prediction method, the possible fault types of the server in the running process can be rapidly predicted by means of keyword detection on the log data, so that the efficiency of targeted maintenance on the server is improved. Moreover, the log data generated by the server can be fully utilized, and the defect that operation and maintenance personnel cannot directly maintain the server according to the log data is overcome, so that the applicability of the log data is improved.

In this embodiment, a server failure prediction method is provided, which may be used for the server, and fig. 2 is a flowchart of a server failure prediction method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

Step S201, a log packet generated during the running of the server is obtained. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S202, keyword matching is carried out on each log data and a plurality of preset target keyword sets respectively, and at least one target fault type corresponding to the server is determined.

Specifically, since each log data is identical to the process principle of keyword matching with a plurality of target keyword sets. Therefore, a process of keyword matching one of the plurality of log data with the plurality of target keyword sets will be specifically described below. The step S202 includes:

in step S2021, a hardware module corresponding to the log data is determined, and a word set corresponding to the log data is determined.

In the method, as different hardware modules may cause different fault types in the operation process, interference of other hardware modules is avoided, the hardware module corresponding to the log data is determined, and a word set forming the log data is determined. Wherein, the server includes but is not limited to the following hardware modules: central processing unit (Central Processing Unit, CPU), memory, hard disk, network card, disk array (Redundant Arrays ofIndependent Disks, RAID), etc.

Step S2022 determines a plurality of associated keyword sets corresponding to the hardware modules from the plurality of target keyword sets.

In this manner, in order to improve the validity and accuracy of the matching, a plurality of associated keyword sets corresponding to the hardware modules are determined from a plurality of target keyword sets. That is, each target keyword set has a corresponding hardware module, and a plurality of target keyword sets at least correspond to one hardware module, so as to ensure the validity of the matching, and a plurality of associated keyword sets corresponding to the hardware modules are determined from the plurality of target keyword sets according to the corresponding relation between the target keyword sets and the hardware modules.

Step S2023, matching the word sets with the associated keyword sets respectively, and determining the target associated keyword set.

In this manner, word sets of log data are respectively matched with each associated keyword set, and a target associated keyword set is determined. The target associated keyword set is an associated keyword set with the largest number of words in the word set. That is, when the number of repetitions of the plurality of words in the word set and the associated keywords in the target associated keyword set is large, the log data may be determined to be the abnormal data under the fault type corresponding to the target associated keyword set, and further, the fault type corresponding to the target associated keyword set may be determined to be the target fault type.

Step S2024, determining the fault type corresponding to the target association keyword set as the target fault type.

Through the method for determining at least one target fault type corresponding to the server, the target associated keyword set can be determined in a word matching mode, and therefore the situation of missing matching or mismatching is avoided.

And step S203, obtaining a fault prediction result of the server according to at least one target fault type. Please refer to step S103 in the embodiment shown in fig. 1 in detail, which is not described herein.

Specifically, the step S203 includes:

step a1, if only one target fault type is determined, taking the target fault type as a fault prediction result of the server;

step a2, if the target fault types are determined to be multiple, taking the target fault type with the largest number as a fault prediction result of the server based on the number of each target fault type;

and a3, if the number of the target fault types is the same, taking the multiple target fault types as a fault prediction result of the server.

In this manner, since there are more log data in the log packet and not all log data are normal data, this results in that when there is abnormal data in the log packet, the determined target fault type may be one or more.

If only one target fault type is determined, only the target fault type is used for representing possible faults of the server, and the target fault type is used as a fault prediction result of the server.

If the target fault types are determined to be multiple, the possible faults of the characterization server are multiple, and then the fault prediction result of the server can be determined based on the number of the target fault types. The more the number of the target fault types is, the higher the probability of representing the occurrence of the target fault types is, and the more urgent needs to be solved, so that the most number of the target fault types can be used as a fault prediction result of the server.

According to the server fault prediction method, the log data can be detected in a targeted mode based on the log data and the hardware modules corresponding to the target keyword sets, so that the prediction effectiveness and accuracy are improved, and the server can be maintained in time.

In some optional embodiments, if the determined target fault types are multiple, the fault prediction result of the server may also be determined based on the risk level corresponding to each target fault type. For example: and taking the target fault type with the highest risk level as a fault prediction result of the server, and determining that the server possibly has faults corresponding to the target fault type with the highest risk level, so that the target fault type can be maintained in a targeted manner later, and the operation safety of the server is ensured.

In some implementation scenarios, if the determined target fault type is one and the risk level of the target fault type is extremely low, the risk level may be ignored, and the fault prediction result of the server is determined to be normal, thereby helping to save maintenance cost.

In some optional embodiments, after determining the fault prediction result of the server, in order to improve the maintenance efficiency, a target diagnosis scheme corresponding to the fault prediction result is determined based on a corresponding relation between a plurality of preset fault types and the diagnosis schemes, so that the making time for determining the target diagnosis scheme can be shortened, the process for determining the target diagnosis scheme is simpler, the technical requirements on operation and maintenance personnel are reduced, and the maintenance efficiency is convenient to improve.

The following embodiment will specifically explain a generation process of determining a plurality of target keyword sets.

Fig. 3 is a flowchart of a method for generating a target keyword set according to an embodiment of the present invention, as shown in fig. 3, the flowchart includes the steps of:

in step S301, a plurality of log packet samples are acquired.

In an embodiment of the invention, a history log package of a server can be obtained by a basic management controller (Baseboard Management Controller, BMC) of the server. Each history log packet comprises a plurality of log files and real-time data of each hardware module of the field, wherein the real-time data is collected when each log file reports errors.

Because the BMC adds the collection time for collecting the corresponding log packet when packaging the log packet, in order to improve the generation effectiveness of the target keyword set, the historical log packet with the collection time within the designated time interval is used as a log packet sample based on the collection time of each historical log packet, and a plurality of log packet samples are further obtained. Wherein the log packet sample includes a plurality of pieces of issue log data. For example: the specified time interval may be within 24 hours before the log packet sample is obtained, or within one week before the log packet sample is obtained. If the designated time interval is within 24 hours before the log packet sample is acquired, the shorter the time difference between the designated time interval and the log packet sample is acquired, the more the obtained history log packet can reflect the real state of the server. If the designated time interval is within one week before the log packet sample is acquired, the number of the history log packets is further enriched, so that the acquisition effectiveness of the log packet sample is guaranteed.

In some alternative embodiments, to save acquisition time of multiple target keyword sets, data cleaning and preprocessing are performed on each issue log data to remove invalid information in each issue log data.

Step S302, based on the number of the problem log data in each log packet sample and the attribute information of each problem log data, clustering is carried out on a plurality of pieces of log data in each log packet sample, so as to obtain a plurality of middle clustering clusters corresponding to each log packet sample.

In an embodiment of the present invention, the problem log data involved in different log packet samples is different. In order to fully discover the characteristics of the log data generated by the server under the same fault type, firstly taking a packet as a unit, and carrying out clustering processing on a plurality of pieces of log data in each log packet sample based on the quantity of the problem log data in each log packet sample and the attribute information of each problem log data to obtain a plurality of middle clustering clusters corresponding to each log packet sample.

Step S303, a plurality of intermediate cluster clusters corresponding to every two log packet samples are aggregated based on a cluster distance threshold value, and a plurality of target cluster clusters are generated.

In the embodiment of the invention, each log packet sample corresponds to a plurality of middle cluster clusters, and in order to improve the dimension of the cluster clusters, cross-packet aggregation processing is performed on each middle cluster, and the plurality of middle cluster clusters corresponding to each two log packet samples are aggregated based on a cluster distance threshold value, so that a plurality of target cluster clusters are generated. The cluster distance threshold may be understood as determining the maximum distance that two intermediate clusters belong to the same cluster. For example: the cluster distance of two middle clusters which do not belong to the same log packet sample can be determined by adopting an average distance clustering mode. For example: the formula for average distance clustering can be as follows:

Wherein C is _i And C _j Representing two different intermediate clusters, respectively. x represents C _i The number of issue log data in (a), z represents C _j The number of issue log data. d, d _avg (C _i ,C _j ) Representing the cluster distance of two intermediate clusters that do not belong to the same log packet sample.

If the cluster distance is less than or equal to the cluster distance threshold, the cluster distance and the cluster distance are aggregated into the same cluster. If the cluster distance is greater than the cluster distance threshold, the cluster distance and the cluster distance are not aggregated into the same cluster.

And by analogy, responding to the completion of all the intermediate cluster clusters in the log packet samples, and generating a plurality of target cluster clusters.

Step S304, extracting target keywords in each target cluster respectively to obtain a target keyword set corresponding to each target cluster.

In the embodiment of the invention, since the target cluster comprises a plurality of pieces of problem log data, in order to reduce data redundancy, words with high occurrence times are used as target keywords according to the occurrence times of words in each problem log data, and further the target keywords corresponding to the target cluster are obtained, so that whether the log data of a subsequent detection server are normal or not can be detected through a target keyword set.

According to the server fault prediction method, the association relation among the problem log data can be fully mined based on the problem log data in the log packet samples, a plurality of target keyword sets are obtained, and then the log data generated by the server can be fully utilized when whether the log data of the server are normal or not is detected, so that whether the server has faults or not can be predicted in time, and the effectiveness and timeliness of targeted maintenance on the server are improved.

In some optional embodiments, the method for generating a target keyword set further includes:

step S305, respectively obtaining the fault types corresponding to the log packet samples, and associating the fault types with the plurality of pieces of problem log data included in the corresponding log packet samples.

In the embodiment of the invention, when each log packet sample is acquired, the fault type corresponding to each log packet sample is acquired together, and then the fault type is associated with a plurality of pieces of problem log data included in the corresponding log packet sample, so that the fault type corresponding to the target cluster is determined later.

Step S306, determining the fault type corresponding to the target cluster based on the fault type corresponding to each problem log data in the target cluster.

In the embodiment of the invention, each problem log data in the target cluster can be understood as similar data, but the fault types corresponding to different problem log data can be different, so that the fault types corresponding to the target cluster can be determined by combining the fault types corresponding to each problem log data.

For example: the fault type corresponding to the target cluster can be determined based on the priority of each fault type related to the target cluster or the number of fault types, and the specific determination process can be set by related personnel by themselves without limitation in the invention.

By the embodiment, the fault type of the server possibly generating faults can be rapidly predicted when the log data of the server is detected later, so that the prediction efficiency is improved, and the prediction time is saved.

In some alternative embodiments, the log package sample includes a plurality of issue log files; the attribute information includes word sets corresponding to the issue log data, and hardware modules corresponding to the issue log data, and the step S303 includes:

step b1, clustering each question log data in the question log file based on word sets of each question log data in the question log file, the repetition number of each question log data in the question log file and a hardware module corresponding to each question log data to obtain a plurality of initial clustering clusters.

In the embodiment of the invention, the log packet sample comprises a plurality of problem log files, the same problem log file comprises a plurality of problem log data, and different problem log data can correspond to the same hardware module.

In order to reduce the clustering difficulty, a plurality of problem log data in the same problem log file are clustered by taking the file as a unit. In order to reduce the influence of different hardware modules on clustering, based on word sets of each problem log data in the problem log file, the repetition number of each problem log data in the problem log file and the hardware module corresponding to each problem log data, each problem log data in the problem log file is clustered to obtain a plurality of initial clustering clusters.

In some alternative embodiments, step b1 includes:

step b11, obtaining a plurality of problem log data sets according to the hardware modules corresponding to the problem log data;

step b12, counting the data quantity of the problem log data in each problem log data set;

step b13, clustering each question log data in each question log data set based on the word set of each question log data, the repeated number of each question log data in the corresponding question log data set and the data number of the corresponding question log data set to obtain a plurality of initial cluster clusters.

In this manner, when clustering is performed, each problem log data is further divided according to the corresponding hardware module, and when clustering is performed subsequently, the initial cluster clusters corresponding to each hardware module can be isolated from each other.

In order to facilitate targeted clustering, a plurality of problem log data in the problem log file are divided according to corresponding hardware modules, so that a plurality of problem log data sets are obtained. Counting the data quantity of the problem log data in each problem log data set, and clustering the problem log data in each problem log data set based on the word set of each problem log data, the repeated quantity of each problem log data in the corresponding problem log data set and the data quantity of the corresponding problem log data set to obtain a plurality of initial clustering clusters.

In some alternative embodiments, step b13 includes:

step b131, performing full-line matching on the current problem log data and other problem log data in the problem log data set respectively to obtain a first statistic value, wherein the first statistic value is the data quantity which is completely overlapped with the current problem log data in the other problem log data;

step b132, word matching is carried out on word sets of the current problem log data and word sets of other problem log data in the problem log data respectively, so as to obtain second statistical values, wherein the second statistical values are the number of word sets with the same words as the word sets in each word set;

Step b133, determining the total number of words in the question log data set based on the word set of each question log data;

step b134, clustering each problem log data in the problem log data set based on the first statistic, the data quantity, the second statistic and the word total quantity to obtain a plurality of initial clustering clusters.

Specifically, the formula used to cluster the current issue log data with other issue log data in the issue log data set may be as follows:

where k is the total amount of data in the issue log dataset from which duplicate issue log data is removed.

The word set may be obtained by performing data processing on the corresponding problem log data. A specific process may include removing time-stamp information from the issue log data, and separating based on spaces, thereby obtaining a set of words.

In other alternative embodiments, different weights may be configured for different hardware modules in advance, and then after determining the hardware module corresponding to each problem log data, the weights of the corresponding hardware modules are configured to the corresponding problem log data, so that when a subsequent cluster occurs, the problem log data of different hardware modules can be distinguished through the weights, and thus, the problem log data involved in the initial cluster, the intermediate cluster or the target cluster are ensured to belong to the same hardware module. For example: the weight of the module may be R _Mi ＝10 ²ⁱ I epsilon (1, n, i is the current hardware module, n represents the total number of hardware modules the server includes.

In still other optional embodiments, the time weight of each problem log data may be determined based on the generation time of the problem file corresponding to each problem log data, and then the above clustering formula is combined to obtain a clustering formula for clustering each problem log data in the same problem file as follows:

furthermore, the clustering is performed by adopting the method, so that the clustering time is saved, the hardware module isolation can be automatically realized, and the local optimum is prevented from being trapped.

In one implementation scenario, the temporal weight R _t The configuration of (c) may be as follows:

that is, the time weight of the issue log data within 24 hours is 1, the time weight of the issue log data within 24 to 7×24 is 0.5, and the time weight of the issue log data greater than 7×24 is 0.

And b2, merging a plurality of initial clusters corresponding to each two problem log files according to the number of target clusters to generate intermediate clusters of the number of target clusters. The clustering process is similar to the specific implementation process of step S303 in fig. 3, and will not be described in detail here.

Through the embodiment, the clustering effect is improved, the breadth of the target keywords can be improved better, and therefore the accuracy of fault prediction is guaranteed.

In some alternative embodiments, the issue log data is log data that includes default keywords. Default keywords may include: error, fail or rolling. The problem log file can only reserve the log data of each problem, and propose other useless data, so that the calculated amount can be saved when the clustering processing is carried out subsequently.

In other alternative embodiments, to improve accuracy, log data in the up-down specified line number of each problem log data in the original problem log file may also be reserved in the problem log file.

In an implementation scenario, the process of obtaining multiple target clusters may be as follows:

1) And screening the problem log data in the log packet sample through a default keyword key. In order to reduce the impact between different hardware modules, if the problem log data is detected to contain corresponding module keywords, corresponding module weights are added. For example: taking a server including 7 hardware modules as an example, in order to prevent clustering between different modules, the module weight configuration may be configured according to the following formula: r is R _Mi ＝10 ²ⁱ I epsilon (1, 7, i is the current hardware module. The fault type corresponding to the log packet sample is R, and the diagnostic scheme is S.

2) Processing a log packet sample, wherein the log packet comprises a plurality of log files I _i I epsilon (0, n is the total number of problem log files in the log packet sample).

A plurality of errors are possibly included in each problem log file, the problem log data generated when each problem log file is in error is extracted and is respectively compared with the time in a log packet, if the time is less than 24 hours, the importance level of the log at the moment is highest, if 1 day to 7 days, the importance level is inferior to the log level, if the time exceeds 7 days, the importance level is higher than the importance level of the logThe significance of the test is not great. Reject useless data to form error reporting set _ij I epsilon (0, n, j epsilon (0, K, n are the total number of the problem log files in the log packet samples), i is the current problem log file, and then the error reporting set l of a plurality of log packet samples is obtained _tij J epsilon (0, K) t epsilon (1, W, t is the total number of log packet samples, i is the current problem log file, j is the error number of the current problem log data, and K is the data number of the problem log data set).

And carrying out data processing on each question log data to obtain a word set of each question log data and the total number of words of each question log file. And determining the file weight of each problem log file according to the different repetition degrees of each word, wherein the file weight is eta. In the clustering process, if the problem log data are matched in the whole row, the aggregation of the clusters can be ensured by increasing the parameters by influencing the factor p,

The formula used to cluster the current issue log data with other issue log data in the issue log data set may be as follows:

and responding to completion of clustering of the log data of each problem, and obtaining a plurality of middle clustering clusters of the current log packet sample.

And (3) sequentially calculating cluster distances from the 1 st log packet sample to each second log packet sample and the third log packet sample. For example: setting the number k of clusters of the current log packet sample, which is generally set to 15, calculating the average distance between clusters, and setting a cluster distance threshold d _avg (C _i ,C _j ) To embody twoThe similarity of clusters, here set to 5 according to an empirical value, if greater than 5, is considered that no clusters exist between the two clusters, wherein,/>

and merging the clusters with the short distance after the calculation is completed, and updating the distance matrix of the clusters. Until K clusters are found.

Through calculation, a cluster C is formed ₁ And meanwhile, labeling the cluster according to the fault type and the diagnosis result uploaded by the operation and maintenance personnel.

Suppose that C is formed _k And clustering clusters, extracting keywords with high ratio in each clustering cluster, and outputting diagnosis keywords.

The log package of a certain server is collected, and is imported into a diagnosis device, the server log is diagnosed according to the diagnosis rules produced above, and if the log matched with the rules exists, the processing mode is given.

In another implementation scenario, as shown in fig. 4, the generation system of the plurality of target clusters may include: log collection device, log analysis device, diagnostic device.

The log collection means includes various log collections monitored by the BMC and associated with log packet data in the dimension of time. The log analysis device comprises data cleaning and preprocessing, feature extraction of the problem log, model building and data training. In the data training process, the clustering clusters are output through multiple iterations. And forming a diagnosis rule by final human diagnosis of the diagnosis rule of the final output association log. When the operation and maintenance personnel receive the log on site, the log is uploaded to a diagnosis module, and the problems diagnosed by the response log and the solutions of the responses are output.

The process of predicting server failure may be as follows:

the log collection device acquires a plurality of history log packets generated by the server. And inputting the plurality of history log packets into a log analysis device, and performing data processing on the plurality of history log packets through the log analysis device to obtain a plurality of target keyword sets and a plurality of target corresponding relations. The target correspondence is a correspondence between the corresponding target fault type and the corresponding diagnosis scheme. A plurality of target keyword sets, and a plurality of target correspondence relationships are stored in a diagnostic device.

In the running process of the server, the log collecting device inputs a log packet generated during the running of the server into the diagnosis device to conduct fault prediction, and outputs a fault prediction result and a corresponding target diagnosis scheme.

The embodiment also provides a server fault prediction device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware modules that implement the predetermined function. While the apparatus described in the following embodiments is preferably implemented in software, implementation of hardware modules, or a combination of software and hardware modules, is also possible and contemplated.

The present embodiment provides a server failure prediction apparatus, as shown in fig. 5, including:

the first obtaining module 501 is configured to obtain a log packet generated during running of the server, where the log includes a plurality of log data.

The matching module 502 is configured to match each log data with a preset plurality of target keyword sets, determine at least one target fault type corresponding to the server, where different target keyword sets correspond to different fault types.

The first processing module 503 is configured to obtain a failure prediction result of the server according to at least one target failure type.

In some alternative embodiments, the matching module 502 includes: the first determining unit is used for determining a hardware module corresponding to the log data and determining a word set corresponding to the log data; a second determining unit configured to determine a plurality of associated keyword sets corresponding to the hardware module from the plurality of target keyword sets; the first matching unit is used for respectively matching the word sets with each associated keyword set to determine a target associated keyword set, wherein the target associated keyword set is the associated keyword set with the largest number of words in the word sets; the first execution unit is used for determining the fault type corresponding to the target associated keyword set as the target fault type.

In some alternative embodiments, the first processing module 503 includes: the first processing unit is used for taking the target fault type as a fault prediction result of the server if only one target fault type is determined; the second processing unit is used for taking the most target fault types as the fault prediction results of the server based on the number of the target fault types if the target fault types are determined to be multiple; and the third processing unit is used for taking multiple target fault types as fault prediction results of the server if the number of the target fault types is the same.

In some alternative embodiments, the apparatus further comprises: the scheme determining module is used for determining a target diagnosis scheme corresponding to the fault prediction result based on the corresponding relation between the preset fault types and the diagnosis schemes.

In some alternative embodiments, the apparatus further comprises: the second acquisition module is used for acquiring a plurality of log packet samples, wherein the log packet samples comprise a plurality of pieces of problem log data; the first clustering module is used for carrying out clustering processing on a plurality of pieces of log data in each log packet sample based on the quantity of the problem log data in each log packet sample and attribute information of each problem log data to obtain a plurality of middle clustering clusters corresponding to each log packet sample; the aggregation module is used for aggregating a plurality of middle cluster clusters corresponding to every two log packet samples based on a cluster distance threshold value to generate a plurality of target cluster clusters; the word extraction module is used for respectively extracting target keywords in each target cluster to obtain target keyword sets corresponding to each target cluster.

In some alternative embodiments, the apparatus further comprises: the association module is used for respectively acquiring the fault types corresponding to the log packet samples and associating the fault types with a plurality of pieces of problem log data included in the corresponding log packet samples; the second processing module is used for determining the fault type corresponding to the target cluster based on the fault type corresponding to the problem log data in the target cluster.

In some alternative embodiments, the log package sample includes a plurality of issue log files; the attribute information comprises word sets corresponding to the problem log data and hardware modules corresponding to the problem log data; a first clustering module comprising: the second execution unit is used for clustering each question log data in the question log file based on word sets of each question log data in the question log file, the repeated number of each question log data in the question log file and a hardware module corresponding to each question log data to obtain a plurality of initial cluster clusters; and the third execution unit is used for merging the plurality of initial clusters corresponding to each two problem log files according to the number of target clusters to generate intermediate clusters of the number of target clusters.

In some alternative embodiments, the second execution unit includes: the merging unit is used for obtaining a plurality of problem log data sets according to the hardware modules corresponding to the problem log data, and different hardware modules corresponding to different problem log data sets; a first statistics unit for counting the number of data of the issue log data in each of the issue log data sets; the clustering unit is used for clustering the problem log data in each problem log data set based on the word set of the problem log data, the repeated number of the problem log data in the corresponding problem log data set and the data number of the corresponding problem log data set, and a plurality of initial clustering clusters are obtained.

In some alternative embodiments, the clustering unit includes: the second statistical unit is used for carrying out whole-row matching on the current problem log data and other problem log data in the problem log data set respectively to obtain a first statistical value, wherein the first statistical value is the data quantity which is completely overlapped with the current problem log data in the other problem log data; the third statistical unit is used for respectively carrying out word matching on word sets of the current problem log data and word sets of other problem log data in the problem log data to obtain second statistical values, wherein the second statistical values are the number of word sets, in which words which are the same as the word sets exist, in each word set; a fourth statistical unit for determining the total number of words of the question log data set based on the word set of each question log data; and the clustering subunit is used for clustering each problem log data in the problem log data set based on the first statistic value, the data quantity, the second statistic value and the total number of words to obtain a plurality of initial clustering clusters.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The server failure prediction means in this embodiment is presented in the form of functional units, here referred to as ASIC (Application Specific Integrated Circuit ) circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above described functions.

The embodiment of the invention also provides a server, which is provided with the server fault prediction device shown in the figure 5.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a server according to an alternative embodiment of the present invention, as shown in fig. 6, the server includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the server, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display apparatus coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple servers may be connected, with each device providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware module chip. The hardware module chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the server, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The server further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 6.

The input device 30 may receive entered numeric or character information and generate key signal inputs related to user settings and function control of the server, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in a hardware module, firmware, or as a computer code which may be recorded on a storage medium, or may be originally stored in a remote storage medium or a non-transitory machine-readable storage medium and to be stored in a local storage medium downloaded through a network, so that the method described herein may be stored on such software processes on a storage medium using a general purpose computer, special purpose processor, or a programmable or special purpose hardware module. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware module includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware module, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method for predicting server failure, the method comprising:

acquiring a log packet generated when the server runs, wherein the log comprises a plurality of pieces of log data;

and obtaining a fault prediction result of the server according to the at least one target fault type.

2. The method according to claim 1, wherein the keyword matching is performed on each log data with a preset plurality of target keyword sets, and determining at least one target fault type corresponding to the server includes:

respectively matching the word set with each associated keyword set to determine a target associated keyword set, wherein the target associated keyword set is the associated keyword set with the largest number of words in the word sets;

3. The method according to claim 2, wherein said obtaining a failure prediction result of the server according to the at least one target failure type comprises:

if only one target fault type is determined, taking the target fault type as the fault prediction result of the server;

and if the number of the target fault types is the same, taking the multiple target fault types as the fault prediction results of the server.

4. A method according to claim 3, characterized in that the method further comprises:

and determining a target diagnosis scheme corresponding to the fault prediction result based on the preset correspondence between a plurality of fault types and the diagnosis schemes.

5. The method according to claim 1, wherein the method further comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method according to claim 5 or 6, wherein,

the log package sample comprises a plurality of problem log files; the attribute information comprises word sets corresponding to the problem log data and hardware modules corresponding to the corresponding problem log data;

the clustering processing is performed on the plurality of pieces of log data in each log packet sample based on the number of the problem log data in each log packet sample and the attribute information of each problem log data, so as to obtain a plurality of middle clustering clusters corresponding to each log packet sample, including:

clustering each question log data in the question log file based on a word set of each question log data in the question log file, the repetition number of each question log data in the question log file and a hardware module corresponding to each question log data to obtain a plurality of initial cluster clusters;

and merging a plurality of initial cluster clusters corresponding to every two problem log files according to the number of target cluster clusters to generate intermediate cluster clusters of the number of target cluster clusters.

8. The method of claim 7, wherein the clustering each of the issue log data in the issue log file based on the word set of each of the issue log data in the issue log file, the number of repetitions of each of the issue log data in the issue log file, and the hardware module corresponding to each of the issue log data, to obtain a plurality of initial clusters comprises:

obtaining a plurality of problem log data sets according to the hardware modules corresponding to the problem log data, wherein different problem log data sets correspond to different hardware modules;

clustering each of the problem log data in each of the problem log data sets based on a word set of each of the problem log data, a repetition number of each of the problem log data in a corresponding one of the problem log data sets, and a data number of the corresponding one of the problem log data sets, to obtain a plurality of initial clusters.

9. The method of claim 8, wherein clustering each of the issue log data in each of the issue log data sets based on the word set of each of the issue log data, the number of repetitions of each of the issue log data in the corresponding issue log data set, and the number of data of the corresponding issue log data set, to obtain a plurality of initial cluster clusters, comprises:

Carrying out full-line matching on the current problem log data and other problem log data in the problem log data set respectively to obtain a first statistical value, wherein the first statistical value is the number of data which is completely overlapped with the current problem log data in the other problem log data;

word matching is carried out on the word set of the current problem log data and the word sets of other problem log data in the problem log data respectively, so that a second statistical value is obtained, and the second statistical value is the number of word sets, in which words identical to the word sets exist, in each word set;

determining a total number of words of the issue log data set based on the word set of each of the issue log data;

and clustering each of the problem log data in the problem log data set based on the first statistical value, the data quantity, the second statistical value and the total number of words to obtain a plurality of initial clustering clusters.

10. A server failure prediction apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a log packet generated when the server runs, wherein the log comprises a plurality of pieces of log data;

The matching module is used for matching keywords of the log data with a plurality of preset target keyword sets respectively, determining at least one target fault type corresponding to the server, and enabling different target keyword sets to correspond to different fault types;

and the first processing module is used for obtaining a fault prediction result of the server according to the at least one target fault type.

11. A server, comprising:

a memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the server failure prediction method of any of claims 1 to 9.

12. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the server failure prediction method of any one of claims 1 to 9.