CN117707897A - Fault prediction method, device, computer equipment and storage medium - Google Patents

Fault prediction method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117707897A
CN117707897A CN202311791192.8A CN202311791192A CN117707897A CN 117707897 A CN117707897 A CN 117707897A CN 202311791192 A CN202311791192 A CN 202311791192A CN 117707897 A CN117707897 A CN 117707897A
Authority
CN
China
Prior art keywords
period
state
determining
probability distribution
historical sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311791192.8A
Other languages
Chinese (zh)
Inventor
于淇
郑虹
张杨锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311791192.8A priority Critical patent/CN117707897A/en
Publication of CN117707897A publication Critical patent/CN117707897A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a fault prediction method, a fault prediction device, computer equipment and a storage medium, relates to the technical field of artificial intelligence, and can be used in the financial and technological field or other related fields. The method comprises the following steps: according to the running state data of each server in the server cluster in each historical sampling period, determining the state probability distribution corresponding to each historical sampling period, determining the state transition probability matrix of the server cluster and the autocorrelation coefficients of various running states in the association period according to the state probability distribution corresponding to each historical sampling period, and determining the state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period, thereby determining the fault condition of the server cluster in the prediction period. By adopting the method, the accuracy of server fault prediction can be improved.

Description

Fault prediction method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to a fault prediction method, apparatus, computer device, and storage medium, which may be used in the field of financial science and technology or other related fields.
Background
In order to ensure stable operation of the data center, the fault condition of each server in the data center needs to be predicted, and server spare parts are reasonably purchased, so that equipment maintenance time is shortened, budget use efficiency is improved, and a fault prediction method is based on the prediction.
In the existing fault prediction method, the fault condition of each server is usually predicted according to the historical fault condition of the server.
However, by adopting the existing fault prediction method, the randomness of the server fault is ignored, and the accuracy of the server fault prediction is further reduced.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a failure prediction method, apparatus, computer device, and storage medium that can improve the accuracy of server failure prediction.
In a first aspect, the present application provides a fault prediction method. The method comprises the following steps:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
Determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In one embodiment, determining a state probability distribution corresponding to each historical sampling period according to operational state data of each server in the server cluster in each historical sampling period includes:
for each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
In one embodiment, determining autocorrelation coefficients of various operating states in an associated period according to a state probability distribution corresponding to each historical sampling period includes:
for each running state, determining an average probability value of the running state according to the state probability value of each historical sampling period under the running state; and determining the autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state.
In one embodiment, the number of association periods is at least two, and each association period is continuous; determining an autocorrelation coefficient of the operating state in the associated time period based on the average probability value and the state probability value of each historical sampling time period in the operating state, comprising:
for each association period, determining a period difference between the association period and the predicted period; determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value; determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state; and determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state and the average probability value of each historical sampling period.
In one embodiment, the number of association periods is at least two, and each association period is continuous; determining a state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to a reference period of the association period, wherein the method comprises the following steps:
For each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix; and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
In one embodiment, determining a state transition probability matrix of the server cluster according to a state probability distribution corresponding to each historical sampling period includes:
and carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
In a second aspect, the present application further provides a fault prediction apparatus. The device comprises:
the first determining module is used for determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
the second determining module is used for determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the association period according to state probability distribution corresponding to each historical sampling period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
The third determining module is used for determining state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and the fault prediction module is used for determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
Determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
And determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
According to the fault prediction method, the fault prediction device, the computer equipment and the storage medium, the state probability distribution corresponding to each historical sampling period is determined according to the running state data of each server in the server cluster in each historical sampling period, so that the state transition probability matrix of the server cluster and the autocorrelation coefficients of various running states in the associated period are determined; and then, according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period, determining the state probability distribution corresponding to the prediction period, and further determining the fault condition of the server cluster in the prediction period. Compared with the prior art, the method has the advantages that the method is adopted to predict the fault condition of each server only according to the historical fault condition of the server, the autocorrelation coefficient is introduced, and the running state of the server is combined with elements such as the early fault condition of equipment, the running loss of the equipment, the environmental change trend and the like under the condition of considering the randomness of the fault of the server, so that the accuracy of the fault prediction of the server is improved.
Drawings
FIG. 1 is a flow diagram of a fault prediction method in one embodiment;
FIG. 2 is a flow chart of determining autocorrelation coefficients in one embodiment;
FIG. 3 is a flowchart illustrating determining autocorrelation coefficients according to another embodiment;
FIG. 4 is a flow chart illustrating determining a state probability distribution corresponding to a prediction period according to an embodiment;
FIG. 5 is a flow chart of a fault prediction method according to another embodiment;
FIG. 6 is a block diagram of a failure prediction apparatus in one embodiment;
FIG. 7 is a block diagram showing a structure of a failure prediction apparatus in another embodiment;
fig. 8 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In order to ensure stable operation of the data center, the fault condition of each server in the data center needs to be predicted, and server spare parts are reasonably purchased, so that equipment maintenance time is shortened, budget use efficiency is improved, and a fault prediction method is based on the prediction.
In the existing fault prediction method, the fault condition of each server is usually predicted according to the historical fault condition of the server. However, by adopting the existing fault prediction method, the randomness of the server fault is ignored, and the accuracy of the server fault prediction is further reduced.
Based on this, in one embodiment, as shown in fig. 1, a fault prediction method is provided, and the method is applied to a server for illustration, and specifically includes the following steps:
s101, determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period.
Wherein a sampling period refers to a period of data acquisition, for example, one sampling period may be one month; the historical sampling period refers to each sampling period preceding the current period; the operation state data refers to related data capable of representing an operation state of the server; the state probability distribution is used to represent a distribution of the operational states of servers in the server cluster, e.g., the state probability distribution includes state probability values for various operational states.
Alternatively, the running state data of each server in the server cluster in each historical sampling period may be input into a trained distribution determination model, and the distribution determination model determines the state probability distribution corresponding to each historical sampling period according to the running state data in each historical sampling period and the model parameters.
It can be appreciated that, in order to more accurately represent the state probability distribution, the server operating state may be divided into a normal operating state, a hard disk failure state, a network card failure state, an optical module failure state, a memory failure state, and a power failure state.
Alternatively, for each historical sampling period, determining a state probability value of each running state in the historical sampling period according to running state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; then, according to the state probability values of various running states in the historical sampling period, the state probability distribution corresponding to the historical sampling period is determined.
Optionally, for each historical sampling period, the number of servers in each running state can be counted according to the running state data of each server in the server cluster in the historical sampling period; and then, determining a state probability value corresponding to each running state according to the number of servers in each running state and the number of servers in the server cluster.
Further, the state probability values of various operation states in the historical sampling period can be combined to obtain the state probability distribution corresponding to the historical sampling period. For example, the state probability values of various running states in the historical sampling period can be spliced to obtain the state probability distribution corresponding to the historical sampling period.
S102, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the association period according to state probability distribution corresponding to each historical sampling period.
Wherein the association period is a period selected from each of the historical sampling periods according to the prediction period, it is understood that in order to ensure accuracy of the subsequent prediction, the first several consecutive historical sampling periods adjacent to the current period are generally selected as the association period, for example, the current period is 5 months, and 2 months, 3 months and 4 months of the historical sampling periods may be selected as the association period.
Alternatively, the state probability distribution corresponding to each historical sampling period may be input into a trained matrix construction model, and the matrix construction model determines a state transition probability matrix of the server cluster according to each state probability distribution and model parameters.
Alternatively, normalization processing is performed on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster. Optionally, in order to ensure the normalization of the state transition probability matrix, each state probability distribution may be normalized to obtain a state transition probability matrix related to the number of operating states. For example, if there are 6 operating states, the state transition probability matrix is also 6x6 in size.
Further, the state probability distribution corresponding to each historical sampling period can be input into a trained first coefficient determination model, and the first coefficient determination model determines the autocorrelation coefficients of various running states in the associated period according to each state probability distribution and model parameters.
It can be appreciated that, since the autocorrelation coefficients are determined according to the state probability distribution corresponding to each historical sampling period, the autocorrelation coefficients can characterize the elements such as the early-stage fault condition of the device, the running loss of the device, the environmental change trend and the like.
S103, determining state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period.
Wherein the reference period refers to the earliest sampling period within the association period. For example, in the case of 2 months, 3 months, and 4 months as the association period, 2 months is the reference period.
Optionally, a candidate probability distribution corresponding to each association period may be determined according to a state probability distribution corresponding to the reference period of the association period and a state transition probability matrix; and then, weighting the candidate probability distribution corresponding to each association period by adopting the autocorrelation coefficients of various running states in the association period, so as to obtain the state probability distribution corresponding to the prediction period.
S104, determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
Optionally, the state probability distribution corresponding to the prediction period may be multiplied by the number of servers in the server cluster to determine the number of servers in each running state in the prediction period; then, the spare parts of the server can be purchased according to the number of the servers in each running state in the prediction period.
In the fault prediction method, the state probability distribution corresponding to each historical sampling period is determined according to the running state data of each server in the server cluster in each historical sampling period, so that the state transition probability matrix of the server cluster and the autocorrelation coefficients of various running states in the association period are determined; and then, according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period, determining the state probability distribution corresponding to the prediction period, and further determining the fault condition of the server cluster in the prediction period. Compared with the prior art, the method has the advantages that the method is adopted to predict the fault condition of each server only according to the historical fault condition of the server, the autocorrelation coefficient is introduced, and the running state of the server is combined with elements such as the early fault condition of equipment, the running loss of the equipment, the environmental change trend and the like under the condition of considering the randomness of the fault of the server, so that the accuracy of the fault prediction of the server is improved.
In order to ensure the accuracy of the autocorrelation coefficient determination, in this embodiment, an alternative method for determining the autocorrelation coefficient is provided, as shown in fig. 2, and specifically includes the following steps:
s201, for each operation state, determining an average probability value of each history sampling period according to the state probability value of the operation state.
Wherein the average probability value is used to characterize the probability mean of various operating states.
Alternatively, for each operating state, the average of the state probability values of each historical sampling period in that operating state may be used as the average probability value for that operating state.
S202, determining an autocorrelation coefficient of the running state in the associated period according to the average probability value and the state probability value of each historical sampling period in the running state.
Alternatively, for each operation state, the average probability value in the operation state and the state probability value of each history sampling period in the operation state may be input into the second coefficient determination model at the same time, and the second coefficient determination model determines the autocorrelation coefficient of the operation state according to the average probability value and the state probability value of each history sampling period.
In this embodiment, by determining the autocorrelation coefficients of such an operation state in the associated period based on the average probability value and the state probability value of each historical sampling period in such an operation state, the accuracy of the autocorrelation coefficient determination can be ensured.
In order to ensure the accuracy of the autocorrelation coefficient determination, in the above embodiment, the number of correlation periods is at least two, and each correlation period is continuous; further, there is provided an alternative method for determining the autocorrelation coefficients, as shown in fig. 3, specifically comprising the steps of:
s301, for each association period, a period difference between the association period and the predicted period is determined.
Where the period difference refers to the number of samples between the associated period and the predicted period, e.g., the associated period is the 11 th sampling period and the predicted period is the 13 th sampling period, the period difference is 2.
Alternatively, for each association period, a period difference between the association period and the predicted period may be determined according to the number of samples between the association period and the predicted period.
S302, determining each cross-correlation group according to the time interval difference value.
Wherein each cross-correlation group includes two historical sampling periods, and the interval between the two historical sampling periods is equal to the period difference.
Alternatively, each cross-correlation group may be determined from each historical sampling period based on the period difference. For example, the historical sampling period is sampling period 0-sampling period 5, and if the period difference is 2, the cross-correlation group is sampling period 0 and sampling period 2, sampling period 1 and sampling period 3, sampling period 2 and sampling period 4, sampling period 3 and sampling period 5, respectively; if the period difference is 3, the cross-correlation groups are respectively sampling period 0 and sampling period 3, sampling period 1 and sampling period 4, sampling period 2 and sampling period 5.
S303, determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability value of two historical sampling periods in each cross-correlation group under the running state.
Alternatively, for each operation state, referring to the following formula (1), the average probability value, and the cross-correlations may be calculatedThe state probability values of two historical sampling periods in the group under the running state determine the correlation coefficient of each cross-correlation group. Wherein S is k (i) The correlation coefficient of each cross-correlation group in the ith operation state under the condition that the time interval difference value is k; t refers to the t-th historical sampling period; m refers to the total number of historical sampling periods; k refers to the time period difference; Mean probability values in the ith operating state; />And->Refers to the state probability value,/for the i-th operating state of two historical sampling periods in the cross-correlation group>Refers to the state probability value of the t-th historical sampling period in the i-th running state,/for the t-th historical sampling period>Refers to the state probability value of the (t+k) th historical sampling period in the (i) th operating state.
(1)
S304, determining the autocorrelation coefficient of the running state in the correlation period according to the correlation coefficient of each cross correlation group, the state probability value of each historical sampling period in the running state and the average probability value.
Alternatively, for each operation state, referring to the following formula (2), the autocorrelation coefficient of the operation state in the correlation period may be determined according to the correlation coefficient of the cross-correlation group, the state probability value of the history sampling period in the operation state, and the average probability value. Wherein R is k (i) Refers to the autocorrelation coefficient in the ith operating state.
(2)
It will be appreciated that, in order to ensure normalization of the autocorrelation coefficients of the various operating states in each association period, the autocorrelation coefficients of the various operating states in each association period may be normalized using the following formula (3). Wherein W is k (i) Refers to the correlation coefficient of each cross-correlation group in the ith operating state after the normalization processing in the case that the time interval difference is k.
(3)
In this embodiment, the period difference and the cross-correlation groups are introduced, and the autocorrelation coefficients of the operation states in the correlation period are determined by determining the state probability value and the average probability value of each history sampling period in the operation states according to the correlation coefficients of each cross-correlation group, so that the accuracy of determining the autocorrelation coefficients can be ensured.
In order to ensure the accuracy of the state probability distribution corresponding to the prediction period, in the embodiment, the number of association periods is at least two, and each association period is continuous; further, another alternative method for determining the state probability distribution corresponding to the prediction period is provided, as shown in fig. 4, specifically including the following steps:
s401, for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix.
Wherein the weighted probability distribution refers to a partial probability distribution of each associated period in relation to the predicted period. It is to be understood that, in the case where the number of association periods is at least two, the reference periods of the respective association periods are the same period.
Alternatively, for each association period, the association state probability matrix P of the association period may be determined from the period difference between the association period and the predicted period i (k) Wherein P is i (k)=P k
Further, the product of the correlation state probability matrix of the correlation period, the autocorrelation coefficients of various running states in the correlation period, and the state probability distribution corresponding to the reference period of the correlation period may be used as the weighted probability distribution corresponding to the correlation period.
S402, taking the sum of weighted probability distributions corresponding to each associated period as the state probability distribution corresponding to the prediction period.
Alternatively, after determining the weighted probability distribution corresponding to each association period, referring to the following formula (4), the sum of the weighted probability distributions corresponding to each association period may be taken as the state probability distribution corresponding to the prediction period. Wherein,a state probability distribution corresponding to a reference period of the associated period; />Refers to a state probability distribution corresponding to a prediction period; n refers to the current period.
(4)
In this embodiment, the weighted probability distribution is introduced, and the sum of the weighted probability distributions corresponding to the associated periods is used as the state probability distribution corresponding to the prediction period, so that the accuracy of the state probability distribution corresponding to the prediction period can be ensured.
Fig. 5 is a schematic flow chart of a fault prediction method in another embodiment, and on the basis of the foregoing embodiment, this embodiment provides an alternative example of a fault prediction method. With reference to fig. 5, the specific implementation procedure is as follows:
s501, determining state probability values of various operation states in each historical sampling period according to the operation state data of each server in the server cluster in each historical sampling period and the number of servers in the server cluster.
S502, determining state probability distribution corresponding to each historical sampling period according to state probability values of various running states in each historical sampling period.
Wherein the state probability distribution includes state probability values for various operating states.
S503, carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
S504, determining the average probability value of various running states according to the state probability values of various running states of each historical sampling period.
S505, a period difference between each associated period and the predicted period is determined.
S506, determining a cross-correlation group corresponding to each association period according to the period difference value between each association period and the prediction period.
Wherein each cross-correlation group includes two historical sampling periods, and the interval between the two historical sampling periods is equal to the period difference.
S507, determining correlation coefficients of the cross-correlation groups corresponding to the association periods in various operation states according to the average probability values of various operation states and the state probability values of the two history sampling periods in the cross-correlation groups corresponding to the association periods in various operation states.
S508, determining the autocorrelation coefficients of various running states in each association period according to the correlation coefficients of the cross-correlation groups corresponding to the various association periods in various running states, the state probability values of the historical sampling periods in various running states and the average probability values of the various running states.
Wherein the associated period is a period selected from each of the historical sampling periods according to the predicted period.
S509, for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period, and the state transition probability matrix.
S510, taking the sum of weighted probability distributions corresponding to each associated period as the state probability distribution corresponding to the prediction period.
S511, determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
The specific process of S501-S511 may be referred to the description of the above method embodiment, and its implementation principle and technical effects are similar, and are not repeated here.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a fault prediction device for realizing the fault prediction method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more fault prediction devices provided below may be referred to the limitation of the fault prediction method hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 6, there is provided a failure prediction apparatus 1 including: a first determination module 10, a second determination module 20, a third determination module 30, and a failure prediction module 40, wherein:
a first determining module 10, configured to determine a state probability distribution corresponding to each historical sampling period according to operation state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
the second determining module 20 is configured to determine a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the association period according to the state probability distribution corresponding to each historical sampling period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
a third determining module 30, configured to determine a state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various operation states in the association period, and the state probability distribution corresponding to the reference period of the association period;
the failure prediction module 40 is configured to determine a failure condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In one embodiment, the first determining module 10 is specifically configured to:
for each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
In one embodiment, as shown in fig. 7, the second determining module 20 includes:
an average value determination unit 21 for determining, for each running state, an average probability value for each historical sampling period based on the state probability value for that running state;
the coefficient determining unit 22 is configured to determine an autocorrelation coefficient of the operation state in the associated period based on the average probability value and the state probability value of each historical sampling period in the operation state.
In one embodiment, the number of association periods is at least two, and each association period is continuous, and the coefficient determining unit 22 is specifically configured to:
for each association period, determining a period difference between the association period and the predicted period; determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value; determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state; and determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state and the average probability value of each historical sampling period.
In one embodiment, the number of association periods is at least two, and each association period is continuous, and the third determining module 30 is specifically configured to:
for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix; and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
In one embodiment, the second determination module 20 is further configured to:
and carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
The respective modules in the above-described failure prediction apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store operational status data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a fault prediction method.
It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
And determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In one embodiment, when the processor executes logic in the computer program to determine a state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period, the following steps are specifically implemented:
for each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
In one embodiment, when the processor executes logic in the computer program to determine autocorrelation coefficients of various operating states in the associated time period according to the state probability distribution corresponding to each historical sampling time period, the following steps are specifically implemented:
for each running state, determining an average probability value of the running state according to the state probability value of each historical sampling period under the running state; and determining the autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state.
In one embodiment, the number of the association periods is at least two, and each association period is continuous, the processor executes logic in the computer program for determining the autocorrelation coefficient of each historical sampling period in the operation state according to the average probability value and the state probability value of each historical sampling period in the operation state, and specifically realizes the following steps:
for each association period, determining a period difference between the association period and the predicted period; determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value; determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state; and determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state and the average probability value of each historical sampling period.
In one embodiment, the number of the association periods is at least two, and each association period is continuous, when the processor executes logic for determining the state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period, and the state probability distribution corresponding to the reference period of the association period in the computer program, the following steps are specifically implemented:
For each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix; and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
In one embodiment, when the processor executes logic in the computer program to determine a state transition probability matrix of the server cluster according to the state probability distribution corresponding to each historical sampling period, the following steps are specifically implemented:
and carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
According to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In one embodiment, the code logic in the computer program for determining a state probability distribution corresponding to each historical sampling period based on operational state data of each server in the cluster of servers over each historical sampling period is executed by the processor and comprises:
for each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
In one embodiment, the code logic in the computer program for determining the autocorrelation coefficients of various operating states during the correlation period based on the state probability distribution corresponding to each historical sampling period, when executed by the processor, performs the steps of:
for each running state, determining an average probability value of the running state according to the state probability value of each historical sampling period under the running state; and determining the autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state.
In one embodiment, the number of the association periods is at least two, and each association period is continuous, and the code logic for determining the autocorrelation coefficient of the running state in the association period according to the average probability value and the state probability value of each history sampling period in the running state in the computer program is executed by the processor, and specifically realizes the following steps:
for each association period, determining a period difference between the association period and the predicted period; determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value; determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state; and determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state and the average probability value of each historical sampling period.
In one embodiment, the number of the association periods is at least two, and each association period is continuous, and the code logic for determining the state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period in the computer program is executed by the processor, and specifically realizes the following steps:
for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix; and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
In one embodiment, the code logic in the computer program for determining the state transition probability matrix of the server cluster according to the state probability distribution corresponding to each historical sampling period is executed by the processor, and specifically implements the following steps:
and carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to the predicted time period;
determining state probability distribution corresponding to a prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
In one embodiment, the computer program is executed by the processor to determine a state probability distribution corresponding to each historical sampling period based on operational state data of each server in the cluster of servers over each historical sampling period, wherein the steps of:
For each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster; and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
In one embodiment, the computer program is executed by the processor to determine autocorrelation coefficients of various operating states within an associated time period based on a state probability distribution corresponding to each historical sampling time period, and specifically implements the steps of:
for each running state, determining an average probability value of the running state according to the state probability value of each historical sampling period under the running state; and determining the autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state.
In one embodiment, the number of association periods is at least two, and each association period is continuous, and the computer program is executed by the processor to determine the autocorrelation coefficient of each historical sampling period in the operation state according to the average probability value and the state probability value of the operation state in each historical sampling period, and specifically implement the following steps:
For each association period, determining a period difference between the association period and the predicted period; determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value; determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state; and determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state and the average probability value of each historical sampling period.
In one embodiment, the number of the association periods is at least two, and each association period is continuous, and the computer program is executed by the processor to determine the state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period, and the state probability distribution corresponding to the reference period of the association period, so that the following steps are specifically implemented:
for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix; and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
In one embodiment, when the computer program is executed by the processor to determine the state transition probability matrix of the server cluster according to the state probability distribution corresponding to each historical sampling period, the following steps are specifically implemented:
and carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
It should be noted that, the data (including, but not limited to, operation state data, etc.) referred to in the present application are all data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method of fault prediction, the method comprising:
determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
according to the state probability distribution corresponding to each historical sampling period, determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the associated period; wherein the associated time period is a time period selected from each historical sampling time period according to a predicted time period;
Determining state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period;
and determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
2. The method of claim 1, wherein determining the state probability distribution corresponding to each historical sampling period based on the operational state data of each server in the server cluster during each historical sampling period comprises:
for each historical sampling period, determining a state probability value of various operation states in the historical sampling period according to the operation state data of each server in the server cluster in the historical sampling period and the number of servers in the server cluster;
and determining the state probability distribution corresponding to the historical sampling period according to the state probability values of various running states in the historical sampling period.
3. The method of claim 1, wherein determining autocorrelation coefficients for various operating states within an associated time period based on a state probability distribution corresponding to each historical sampling time period, comprises:
For each running state, determining an average probability value of the running state according to the state probability value of each historical sampling period under the running state;
and determining an autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state.
4. A method according to claim 3, wherein the number of association periods is at least two and each association period is continuous;
the determining the autocorrelation coefficient of the running state in the associated time period according to the average probability value and the state probability value of each historical sampling time period in the running state comprises the following steps:
for each association period, determining a period difference between the association period and the predicted period;
determining each cross-correlation group according to the time period difference value; wherein each cross-correlation group comprises two historical sampling time periods, and the interval between the two historical sampling time periods is equal to the time period difference value;
determining the correlation coefficient of each cross-correlation group according to the average probability value and the state probability values of two historical sampling time periods in each cross-correlation group in the running state;
And determining the autocorrelation coefficient of the running state in the correlation period according to the state probability value of each correlation group under the running state of each historical sampling period and the average probability value.
5. The method of claim 1, wherein the number of association periods is at least two, and each association period is continuous;
the determining the state probability distribution corresponding to the prediction period according to the state transition probability matrix, the autocorrelation coefficients of various running states in the association period and the state probability distribution corresponding to the reference period of the association period includes:
for each association period, determining a weighted probability distribution corresponding to the association period according to the autocorrelation coefficients of various running states in the association period, the state probability distribution corresponding to the reference period of the association period and the state transition probability matrix;
and taking the sum of weighted probability distributions corresponding to the associated time periods as the state probability distribution corresponding to the prediction time period.
6. The method of claim 1, wherein determining a state transition probability matrix for the server cluster based on the state probability distribution corresponding to each historical sampling period comprises:
And carrying out normalization processing on the state probability distribution corresponding to each historical sampling period to obtain a state transition probability matrix of the server cluster.
7. A fault prediction device, the device comprising:
the first determining module is used for determining state probability distribution corresponding to each historical sampling period according to the running state data of each server in the server cluster in each historical sampling period; wherein the state probability distribution includes state probability values for various operating states;
the second determining module is used for determining a state transition probability matrix of the server cluster and autocorrelation coefficients of various running states in the association period according to state probability distribution corresponding to each historical sampling period; wherein the associated time period is a time period selected from each historical sampling time period according to a predicted time period;
a third determining module, configured to determine a state probability distribution corresponding to the prediction period according to the state transition probability matrix, autocorrelation coefficients of various operation states in the association period, and a state probability distribution corresponding to a reference period of the association period;
and the fault prediction module is used for determining the fault condition of the server cluster in the prediction period according to the state probability distribution corresponding to the prediction period.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311791192.8A 2023-12-25 2023-12-25 Fault prediction method, device, computer equipment and storage medium Pending CN117707897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311791192.8A CN117707897A (en) 2023-12-25 2023-12-25 Fault prediction method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311791192.8A CN117707897A (en) 2023-12-25 2023-12-25 Fault prediction method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117707897A true CN117707897A (en) 2024-03-15

Family

ID=90145944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311791192.8A Pending CN117707897A (en) 2023-12-25 2023-12-25 Fault prediction method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117707897A (en)

Similar Documents

Publication Publication Date Title
CN115809569B (en) Reliability evaluation method and device based on coupling competition failure model
CN116167289A (en) Power grid operation scene generation method and device, computer equipment and storage medium
CN116719646A (en) Hot spot data processing method, device, electronic device and storage medium
CN117707897A (en) Fault prediction method, device, computer equipment and storage medium
CN116383708A (en) Transaction account identification method and device
CN116401238A (en) Deviation monitoring method, apparatus, device, storage medium and program product
CN116108697A (en) Acceleration test data processing method, device and equipment based on multiple performance degradation
CN116227127A (en) Method and device for determining performance of transformer, computer equipment and storage medium
CN115993536A (en) Method, apparatus, device, storage medium, and product for estimating remaining battery energy
CN115862653A (en) Audio denoising method and device, computer equipment and storage medium
CN115481767A (en) Operation data processing method and device for power distribution network maintenance and computer equipment
CN111476356B (en) Memristive neural network training method, device, equipment and storage medium
CN117251225A (en) Application automatic exit method, device, computer equipment and storage medium
CN112446472A (en) Method, apparatus and related product for processing data
CN115759180A (en) Data generation method, apparatus and computer-readable storage medium for brain simulation
CN117349184A (en) Test case generation method, device, computer equipment and storage medium
CN117873304A (en) Low-power-consumption operation method and device applied to chip, chip and storage medium
CN116976399A (en) Training method of fault prediction model, fault prediction method and device
CN117578487A (en) Non-invasive load monitoring method, device and computer equipment
CN117313952A (en) Load prediction method, device, equipment and storage medium
CN116643961A (en) Performance data complement method, device, equipment and storage medium
CN117312306A (en) Financial business data sheet conversion method, apparatus, device, medium and program product
CN118095623A (en) Reactor operation strategy generation method, device, equipment, medium and product
CN117852818A (en) Method, device, computer equipment and storage medium for generating item code
CN118100199A (en) New energy core load balancing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination