CN115878415A

CN115878415A - Cluster server intelligent fault prediction method, system, terminal and storage medium

Info

Publication number: CN115878415A
Application number: CN202211433292.9A
Authority: CN
Inventors: 张嘉谣; 牛玉峰; 陈亮甫
Original assignee: Xian Chaoyue Shentai Information Technology Co Ltd
Current assignee: Xian Chaoyue Shentai Information Technology Co Ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-03-31

Abstract

The invention relates to the technical field of servers, in particular to an intelligent fault prediction method, an intelligent fault prediction system, a terminal and a storage medium for a cluster server. The method comprises the following steps: acquiring real-time information of equipment; analyzing the loss state data of the equipment according to the real-time information of the equipment, and analyzing the loss state data of the equipment to obtain the equipment state data; predicting equipment faults according to the equipment state data; the invention realizes the detection and evaluation of the condition of the component in the server and can find the fault in time.

Description

Cluster server intelligent fault prediction method, system, terminal and storage medium

Technical Field

The invention relates to the technical field of servers, in particular to an intelligent fault prediction method, an intelligent fault prediction system, a terminal and a storage medium for a cluster server.

Background

A server is one of computers that runs faster, is more heavily loaded, and is more expensive than a regular computer. The server provides calculation or application services for other clients (such as terminals like PC, smart phone, ATM and the like and even large equipment like train systems and the like) in the network. The server has high-speed CPU computing capability, long-time reliable operation, strong I/O external data throughput capability and better expansibility.

In order to solve the technical problem, an intelligent fault prediction method, a system, a terminal and a storage medium for a cluster server are provided.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides an intelligent failure prediction method, an intelligent failure prediction system, a terminal and a storage medium for a cluster server.

In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions:

in a first aspect, in an embodiment provided by the present invention, a method for cluster server intelligent failure prediction is provided, the method including the following steps:

acquiring real-time information of equipment;

analyzing the loss state data of the equipment according to the real-time information of the equipment, and analyzing the loss state data of the equipment to obtain the equipment state data;

and predicting the equipment fault according to the equipment state data.

As a further scheme of the invention, the real-time information of the equipment comprises data such as vibration conditions, temperature changes, current stability and the like.

As a further scheme of the present invention, the loss state data of the device is analyzed according to the real-time information of the device, and the device state data is obtained according to the loss state data of the device; the method comprises the following steps: and analyzing the loss state data of the equipment through a BP neural network prediction model to obtain the equipment state data.

As a further scheme of the invention, the construction step of the BP neural network prediction model comprises the following steps:

s201, constructing a BP neural network prediction model according to vibration conditions, temperature changes and current stability elements;

s202, establishing a sample data set according to { (vibration condition, temperature change, current stability) and fault type };

s203, normalizing the sample data set value by using a normalization formula to enable the sample data set value to be in a range from 0 to 1;

s204, inputting the sample data set into the constructed BP neural network prediction model, and outputting fault type data;

s205, judging a calculation error according to the fault type data, and adjusting the weight from a hidden layer to an output layer and the weight from an input layer to the hidden layer of the BP neural network prediction model;

and S206, repeating the steps S204-S205 until the error meets the set value.

As a further scheme of the present invention, the determining a calculation error according to the fault type data and adjusting weights from a hidden layer to an output layer and from an input layer to the hidden layer of the BP neural network prediction model includes: and calculating the error of the model by a least square method, and sequentially updating the weight from back to front by a gradient descent method.

As a further aspect of the present invention, the S30 predicting the device failure according to the device status data includes the following steps:

s301, generating a state transition probability distribution matrix of each associated component;

and S302, predicting the state of the associated component according to the Markov chain.

As a further aspect of the present invention, the generating S301 a state transition probability distribution matrix of each related component includes:

s3011, selecting a history period width T, and acquiring the state of a main component and the state of a related component in each unit time period;

s3012, setting all the prediction states of the principal component and all the states of the associated components, dividing the prediction states and all the states of the associated components according to the states of the principal component, respectively calculating the frequency of data state transition of the principal component in each state in adjacent time periods of the associated components, and obtaining the prediction state probability distribution of the principal component and the state transition probability distribution of the associated components.

In a second aspect, in another embodiment provided by the present invention, a cluster server intelligent failure prediction system is provided, which includes: the device monitoring system comprises a device monitoring terminal 100, a device state analysis module 200 and a device state prediction module 300;

the device monitoring terminal 100 is configured to acquire real-time information of a device, where the real-time information of the device includes vibration conditions, temperature changes, and current stability data;

the device state analysis module 200 is configured to analyze the loss state data of the device according to the real-time information of the device, and analyze the loss state data of the device to obtain device state data;

the device status prediction module 300 is configured to predict a device fault according to the device status data.

In a third aspect, in a further embodiment provided by the present invention, a terminal is provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the cluster server intelligent failure prediction method when loading and executing the computer program.

In a fourth aspect, in a further embodiment provided by the present invention, a storage medium is provided, which stores a computer program that is loaded by a processor and executed to implement the steps of the cluster server intelligent failure prediction method.

The technical scheme provided by the invention has the following beneficial effects:

the invention provides a cluster server intelligent fault prediction method, a system, a terminal and a storage medium, wherein the method acquires real-time information of equipment; analyzing the loss state data of the equipment according to the real-time information of the equipment, and analyzing the loss state data of the equipment to obtain the equipment state data; predicting equipment faults according to the equipment state data; the invention realizes the detection and evaluation of the condition of the component in the server and can find the fault in time.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a cluster server intelligent failure prediction method according to an embodiment of the present invention;

fig. 2 is a flowchart of S30 in the cluster server intelligent failure prediction method according to an embodiment of the present invention;

fig. 3 is a flowchart of S302 in the method for cluster server intelligent failure prediction according to an embodiment of the present invention;

FIG. 4 is a diagram of a neural network model;

FIG. 5 is a block diagram of an embodiment of an intelligent cluster server failure prediction system;

fig. 6 is a block diagram of a terminal according to an embodiment of the present invention.

In the figure: the device monitoring system comprises a device monitoring terminal-100, a device state analysis module-200, a device state prediction module-300, a processor-401, a communication interface-402, a memory-403 and a communication bus-404.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Specifically, the embodiments of the present invention are further explained below with reference to the drawings.

Referring to fig. 1, fig. 1 is a flowchart of an intelligent failure prediction method for a cluster server according to an embodiment of the present invention, and as shown in fig. 1, the intelligent failure prediction method for a cluster server includes steps S10 to S30.

S10, acquiring real-time information of equipment;

in an embodiment of the present invention, the real-time information of the device includes data of vibration condition, temperature variation, current stability, and the like.

In an embodiment of the present invention, the acquiring the real-time information of the device includes performing noise reduction processing on the acquired real-time information of the device, so as to remove noise data.

S20, analyzing the loss state data of the equipment according to the real-time information of the equipment, and analyzing the loss state data of the equipment to obtain the equipment state data;

analyzing the loss state data of the equipment according to the real-time information of the equipment, and analyzing the loss state data of the equipment to obtain the equipment state data; the method comprises the following steps: and analyzing the loss state data of the equipment through a BP neural network prediction model to obtain the equipment state data.

In the embodiment of the present invention, the BP neural network prediction model constructing step includes the steps of:

s203, normalizing the numerical value by using a normalization formula to enable the range of the numerical value to be between 0 and 1;

and S206, repeating the steps S204-S205 until the error meets the set value.

In the embodiment of the present invention, in step S201, a BP neural network prediction model is constructed according to the vibration condition, the temperature change, and the current stability element; the method comprises the following steps:

the three elements of the vibration condition, the temperature change and the current stability are respectively segmented into a variation range, and the variation range space is formed by each segment and respectively comprises the following elements: v = { V ₁ ,V ₂ ,...,V _v }，T＝{T ₁ ,T ₂ ,...,T _t }，E＝{E ₁ ,E ₂ ,...,E _e }；

Constructing early failure symptom feature space A = { A = { (A) ₁ ,A ₂ ,...,A _m The component fault characteristic space is B = { B = ₁ ,B ₂ ,...,B _n }；

Each element Ai of the feature space of the fault symptom is a triplet A _i ＝{V _j ,T _k ,E _l J ∈ {1, 2.., v }, k ∈ {1, 2.., t }, l ∈ {1, 2.., e };

collection { A _i →B _j I belongs to a mapping relation data set of {1,2,. M }, j belongs to a mapping relation data set of {1,2,. N } }, and the values in the data set are normalized: the values are normalized using a normalization formula to range from 0 to 1.

In an embodiment of the present invention, the determining a calculation error according to the fault type data, and adjusting weights from a hidden layer to an output layer and from an input layer to the hidden layer of the BP neural network prediction model includes: and calculating the error of the model by a least square method, and updating the weight sequentially from back to front by a gradient descent method.

And S30, predicting the equipment fault according to the equipment state data.

In the embodiment of the present invention, the S30 predicting the device failure according to the device status data includes the following steps:

S301, the generation of the state transition probability distribution matrix of each associated component comprises the following steps:

s3012, setting all the prediction states of the principal component and all the states of the associated components, dividing the prediction states and all the states of the associated components according to the states of the principal component, respectively calculating the frequency number of data state transition of the adjacent time periods of the associated components in each state of the principal component, and obtaining the prediction state probability distribution of the principal component and the state transition probability distribution of the associated components.

Illustratively, assume that all predicted states of the master component a are E = { E = { (E) ₁ ,E ₂ ,E ₃ ,E ₄ In which E ₁ Representing a state of no fault level, E ₂ Representing a mild degree of failure state, E ₃ Representing a state of moderate degree of failure, E ₄ Representing a severe fault condition; associated with component b all states are Q = { Q = { [ Q ] ₁ ,Q ₂ ,...,Q ₄ In which Q ₁ Representing a state of no fault level, Q ₂ Representing a mild fault condition, Q ₃ Representing a medium fault condition, Q ₄ Representing a severe fault condition. Dividing the states of the principal component devices, and respectively calculating the frequency of data state transition of the adjacent time periods of the associated component b of the principal component devices in each state. At E _i In this state, the transfer distribution of the associated component b is:

then the state transition probability distribution of the associated components is: />

In the embodiment of the present invention, S302, performing state prediction on the associated component according to the markov chain, includes the following steps:

and S3021, acquiring the predicted state probability distribution of the main component a and the current state of the associated component b.

And S3022, calculating the probability distribution of the next state of the associated element b in each state of the principal element device a according to the Markov law. Assuming principal component a all prediction states are E = { E = { (E) ₁ ,E ₂ ,E ₃ ,E ₄ }. Associated with component b all states are Q = { Q = { [ Q ] ₁ ,Q ₂ ,...,Q ₄ }。

For the initial predicted state of the principal component a, set to W ₀ ＝[w ₀₁ ,w ₀₂ ,...,w _0i ,...,w _0m ]Wherein w is _0i Indicating that the predicted state of the pivot element is E at time t =0 _i The probability of (c). Wherein

With predicted state E of principal component a _i I ∈ {1, 2.. Said, m } is taken as an example, an arbitrary time point is selected as a start, and the state at that time point is taken as an initial state, and U is set ₀ ＝{0,...,1,...,0}，U ₀ Representing a unit row vector of 1x n, if the p component is 1 and the other components are 0, representing that the system initial state is in the p state, calculating the state probability U of the next moment _i1 Comprises the following steps:

U _i1 ＝U ₀ *P _i ＝[p _i (1),p _i (2),...,p _i (k),...,p _i (n)]

s3023, calculating the probability distribution of the next state of the associated component b according to the predicted probability distribution of the main component a:

and S3024, summarizing to obtain the loss condition prediction of each component such as the principal component, the related component and the like.

The invention combines the neural network algorithm and the Markov chain to study and judge the state of the equipment components, and the system can detect and evaluate the conditions of the components no matter whether the component loss changes the parameters of the equipment or not.

It should be understood that although the steps are described above in a certain order, the steps are not necessarily performed in the order described. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, some steps of the present embodiment may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or in turns with other steps or at least a part of the steps or stages in other steps.

In one embodiment, referring to fig. 3, in an embodiment of the present invention, an intelligent cluster server failure prediction system is further provided, where the system includes a device monitoring terminal 100, a device state analysis module 200, and a device state prediction module 300.

The device monitoring terminal 100 is configured to obtain real-time information of a device, where the real-time information of the device includes data such as a vibration condition, a temperature change, and a current stability.

The device status analysis module 200 is configured to analyze the loss status data of the device according to the real-time information of the device, and analyze the loss status data of the device to obtain the device status data.

The loss state data of the device state analysis module 200 is analyzed by a BP neural network prediction model to obtain device state data.

As shown in fig. 4, the BP neural network prediction model constructing step includes the following steps:

s203, normalizing the numerical value of the sample data set by using a normalization formula to enable the range of the numerical value to be 0-1;

and S206, repeating the steps S204-S205 until the error meets the set value.

S201, constructing a BP neural network prediction model by using elements of vibration condition, temperature change and current stability; the method comprises the following steps:

Each element a of the failure symptom feature space _i Into a three-original group A _i ＝{V _j ,T _k ,E _l J ∈ {1, 2.., v }, k ∈ {1, 2.., t }, l ∈ {1, 2.., e };

The method for judging the calculation error according to the fault type data and adjusting the weight from the hidden layer to the output layer and the weight from the input layer to the hidden layer of the BP neural network prediction model comprises the following steps: and calculating the error of the model by a least square method, and sequentially updating the weight from back to front by a gradient descent method.

Illustratively, assume that all predicted states of the master component a are E = { E = { (E) ₁ ,E ₂ ,E ₃ ,E ₄ In which E ₁ Representing a state of no fault level, E ₂ Representing a mild degree of failure state, E ₃ Representing a state of moderate degree of failure, E ₄ Representing a severe fault condition; relating to component b all states are Q = { Q = ₁ ,Q ₂ ,...,Q ₄ In which Q ₁ Representing a state of no fault, Q ₂ Representing a mild fault condition, Q ₃ Representing a medium fault condition, Q ₄ Representing a severe fault condition. Dividing the states of the principal component devices, and respectively calculating the frequency of data state transition of the adjacent time periods of the associated component b of the principal component devices in each state. At E _i In this state, the transfer distribution of the associated component b is:

the state transition probability distribution of the associated components is: />

In the embodiment of the present invention, the S302, performing state prediction on the associated component according to the markov chain, includes the following steps:

And S3022, calculating the probability distribution of the next state of the associated component in each state of the principal component according to the Markov law. Assume that all predicted states of the master a are E = { E = } ₁ ,E ₂ ,E ₃ ,E ₄ }. Relating to component b all states are Q = { Q = ₁ ,Q ₂ ,...,Q ₄ }。

For principal component deviceInitial predicted state of element a, set to W ₀ ＝[w ₀₁ ,w ₀₂ ,...,w _0i ,...,w _0m ]Wherein w is _0i Indicating that the predicted state of the principal component is E at time t =0 _i The probability of (c). Wherein

Predicted state E with principal component a _i Taking i ∈ {1, 2., m } as an example, an arbitrary time point is selected as a start, and the state at that time point is set as an initial state, and U is set ₀ ＝{0,...,1,...,0}，U ₀ Representing a unit row vector of 1x n, if the p component is 1 and the other components are 0, representing that the system initial state is in the p state, calculating the state probability U of the next moment _i1 Comprises the following steps:

U _i1 ＝U ₀ *P _i ＝[p _i (1),p _i (2),...,p _i (k),...,p _i (n)]

and S3023, calculating the probability distribution of the next state of the associated component according to the predicted probability distribution of the principal component.

And S3024, summarizing and obtaining the loss condition prediction of each component such as the main component a and the related component b.

In one embodiment, referring to fig. 5, in an embodiment of the present invention, a terminal is further provided, which includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete communication with each other through the communication bus 404.

A memory 403 for storing a computer program;

the processor 401 is configured to execute the cluster server intelligent failure prediction method when executing the computer program stored in the memory 403, and the processor executes the instructions to implement the steps in the foregoing method embodiments.

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The terminal comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein, the terminal can be operated alone to realize the invention, and can also be accessed to the network and realize the invention through the interactive operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

In an embodiment of the invention, a storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. An intelligent failure prediction method for a cluster server is characterized by comprising the following steps: acquiring real-time information of equipment;

and predicting the equipment fault according to the equipment state data.

2. The intelligent cluster server failure prediction method of claim 1, wherein the real-time information of the device includes vibration conditions, temperature changes, current stability, and other data.

3. The intelligent cluster server fault prediction method of claim 2, wherein the loss state data of the device is analyzed according to real-time information of the device, and the device state data is obtained according to the loss state data of the device; the method comprises the following steps: and analyzing the loss state data of the equipment through a BP neural network prediction model to obtain the equipment state data.

4. The intelligent cluster server failure prediction method of claim 3 wherein the BP neural network prediction model construction step comprises the steps of:

s202, establishing a sample data set according to vibration conditions, temperature changes, current stability and fault types;

s205, judging a calculation error according to fault type data, and adjusting weights from a hidden layer to an output layer and from an input layer to the hidden layer of the BP neural network prediction model;

and S206, repeating the steps S204-S205 until the error meets the set value.

5. The intelligent failure prediction method of cluster server according to claim 4, wherein the determining the calculation error according to the failure type data and adjusting the weights from the hidden layer to the output layer and from the input layer to the hidden layer of the BP neural network prediction model comprises: and calculating the error of the model by a least square method, and sequentially updating the weight from back to front by a gradient descent method.

6. The intelligent cluster server failure prediction method of claim 1, wherein predicting device failure based on device status data comprises the steps of:

7. The intelligent cluster server fault prediction method of claim 6, wherein the step S301 of generating the state transition probability distribution matrix of each associated component comprises the following steps:

8. An intelligent cluster server failure prediction system, comprising: the device monitoring system comprises a device monitoring terminal 100, a device state analysis module 200 and a device state prediction module 300;

9. A terminal comprising a memory storing a computer program and a processor implementing the steps of the cluster server intelligent failure prediction method according to any one of claims 1 to 7 when the computer program is loaded and executed.

10. A storage medium storing a computer program which, when loaded and executed by a processor, carries out the steps of the cluster server intelligent failure prediction method according to any one of claims 1-7.