CN109726084B - Method and device for analyzing fault problem of data center - Google Patents

Method and device for analyzing fault problem of data center Download PDF

Info

Publication number
CN109726084B
CN109726084B CN201811525441.8A CN201811525441A CN109726084B CN 109726084 B CN109726084 B CN 109726084B CN 201811525441 A CN201811525441 A CN 201811525441A CN 109726084 B CN109726084 B CN 109726084B
Authority
CN
China
Prior art keywords
fault
weight coefficient
factor
determining
fault factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811525441.8A
Other languages
Chinese (zh)
Other versions
CN109726084A (en
Inventor
贾艳成
李仲夷
张驰
时旭
黄凯鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetsUnion Clearing Corp
Original Assignee
NetsUnion Clearing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NetsUnion Clearing Corp filed Critical NetsUnion Clearing Corp
Priority to CN201811525441.8A priority Critical patent/CN109726084B/en
Publication of CN109726084A publication Critical patent/CN109726084A/en
Application granted granted Critical
Publication of CN109726084B publication Critical patent/CN109726084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for analyzing a fault problem of a data center, wherein the method comprises the following steps: monitoring state information of each fault factor of the fault problem; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.

Description

Method and device for analyzing fault problem of data center
Technical Field
The invention relates to the technical field of fault identification, in particular to a method and a device for analyzing a fault problem of a data center.
Background
With the development of big data technology, the development of data centers is more and more rapid. The data center provides various services with large scale, high quality, safety and reliability for internet content providers, enterprises, media, various websites and the like. Therefore, it is important to ensure safe and reliable operation of the data center.
When the Data Center is in operation, due to system problems, hardware failures and the like, failures such as IDC (Internet Data Center) failures, bank failures, private line failures and the like occur. Therefore, it is necessary to find out the failure problem in time so as to remove the failure in the first time and quickly recover the normal operation of the data center.
When a fault problem occurs, an abnormality occurs simultaneously with a plurality of fault factors. In the related art, in order to ensure safe and reliable operation of a data center, the occurrence probability of a fault problem is calculated by monitoring the state of a fault factor occurring along with the fault problem and simultaneously adopting a mode of manually estimating a weight coefficient of the fault factor. However, by means of manually estimating the weight coefficient of the fault factor, certain errors exist due to different manual experiences, the calculated fault problem occurrence probability lacks scientific technical support, and the fault problem identification rate is reduced.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide an analysis method for failure problems of a data center, which is used to solve the problem of low recognition rate of failure problems in the prior art.
A second object of the present invention is to provide an apparatus for analyzing a failure problem in a data center.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for analyzing a failure problem of a data center, including:
monitoring state information of each fault factor of the fault problem;
determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;
and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.
Further, the method further comprises:
and establishing a layered structure of the fault problem, wherein the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.
Further, the determining a weight coefficient of the fault factor according to the pre-established hierarchical structure of the fault problem includes:
acquiring a weight coefficient of an mth control target and a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3..... M;
and for the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2, 3.
Further, the determining, for the nth fault factor, the weighting factor of the nth fault factor according to the weighting factors of the M control targets and the relative weighting factor of each control target to the nth fault factor includes:
determining a weight coefficient w of the nth fault factor according to the following formulan
Figure BDA0001904332670000021
Wherein v ismA weight coefficient indicating an mth control target;
Figure BDA0001904332670000022
a relative weight coefficient representing the mth control target and the nth failure factor.
Further, the obtaining of the relative weight coefficient of the mth control target and each fault factor includes:
acquiring a pre-established second scale table which stores scales between fault factors related to the mth control target;
creating a matrix according to each scale, and determining a characteristic vector of the matrix;
and determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.
According to the method for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.
In order to achieve the above object, a second embodiment of the present invention provides a failure problem analysis device for a data center, including:
the monitoring module is used for monitoring the state information of each fault factor of the fault problem;
the weight coefficient determining module is used for determining the weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;
and the probability calculation module is used for determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.
Further, the apparatus further comprises:
the system comprises an establishing module, a fault problem analyzing module and a fault problem analyzing module, wherein the fault problem analyzing module is used for establishing a layered structure of the fault problem, the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.
Further, the weight coefficient determination module comprises a first unit and a second unit;
the first unit is used for acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3.... M;
and the second unit is used for determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2, 3.
Further, the second unit is specifically configured to:
determining a weight coefficient w of the nth fault factor according to the following formulan
Figure BDA0001904332670000031
Wherein v ismA weight coefficient indicating an mth control target;
Figure BDA0001904332670000032
a relative weight coefficient representing the mth control target and the nth failure factor.
Further, the first unit is specifically configured to:
acquiring a pre-established second scale table which stores scales between fault factors related to the mth control target;
creating a matrix according to each scale, and determining a characteristic vector of the matrix;
and determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.
According to the device for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.
To achieve the above object, a third embodiment of the present invention provides a computer device, including: the data center fault analysis method comprises the following steps of storing a program, storing the program in a memory, and executing the program on a processor, wherein the processor executes the program to realize the data center fault analysis method.
In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for analyzing the failure problem of the data center as described above.
In order to achieve the above object, a fifth aspect of the present invention provides a computer program product, wherein when executed by an instruction processor in the computer program product, a method for analyzing a failure problem of a data center is performed, the method comprising:
monitoring state information of each fault factor of the fault problem;
determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;
and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a method for analyzing a fault problem of a data center according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another method for analyzing a failure problem in a data center according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an analysis apparatus for a fault problem of a data center according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an analysis apparatus for failure problems in another data center according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for analyzing a failure problem in a data center according to an embodiment of the present invention with reference to the drawings.
Fig. 1 is a schematic flowchart of a method for analyzing a fault problem of a data center according to an embodiment of the present invention. The embodiment provides an analysis method for the fault problem of the data center, the execution subject of the analysis method is an analysis device for the fault problem of the data center, and the execution subject is composed of hardware and/or software.
As shown in fig. 1, the method for analyzing the fault problem of the data center includes:
and S101, monitoring the state information of each fault factor of the fault problem.
In this embodiment, the failure problem may be understood as an event or a phenomenon that the data center loses or reduces its specified function, and is represented as an abnormal operation of the data center. The classification of the failure problem is classified by the data center provider or the user according to the actual situation. Common fault problems for data centers include, but are not limited to, IDC faults, bank faults, private line faults, and the like.
The failure factor may be understood as the smallest granularity failure problem. The fault factors appear along with the fault problem, and the corresponding fault factors are different according to the different fault problems, and the number of the fault factors is determined by the fault problem corresponding to the fault factors.
Taking IDC faults as an example, the fault factors that accompany IDC faults include, but are not limited to, one or more of the following fault factors: system success rate, average time consumption comparability and service success rate.
Taking a bank fault as an example, the fault factors that accompany the bank fault include, but are not limited to, one or more of the following fault factors: read timeout rate, read timeout rate loop ratio, average elapsed time.
Taking the private line fault as an example, the fault factors that accompany the private line fault include, but are not limited to, one or more of the following fault factors: total number of strokes, number of failed strokes, success rate, average elapsed time.
In the present embodiment, the status information of the failure factor includes a normal state and an abnormal state. The normal state indicates that the fault factor has not occurred, and the abnormal state indicates that the fault factor has occurred. For example, taking the failure factor of the system success rate as an example, when the failure factor occurs, it indicates that an abnormality of the system success rate occurs, and when the failure factor does not occur, it indicates that an abnormality of the system success rate does not occur.
Further, in order to quantify the state information representing the failure factor, the normal state is denoted as S-0, and the abnormal state is denoted as S-1.
S102, determining a weight coefficient of the fault factor according to a pre-established hierarchical structure of the fault problem.
In the embodiment, the weight coefficient of the fault factor is calculated through a hierarchical structure, and compared with a manual experience estimation mode, the method for calculating the occurrence probability of the fault problem is supported more scientifically.
Further, before step S101, the following steps are also included:
and S104, establishing a hierarchical structure of the fault problem.
In this embodiment, the layered structure sequentially includes a total target layer, a sub-target layer, and a failure factor layer from top to bottom, where the total target layer includes a total target, the sub-target layer includes M control targets, the failure factor layer includes N failure factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.
For example, the fault problem is an IDC fault problem, and the total target of the total target layer is an IDC fault problem; the control targets of the sub-target layer are whether the service is normal or not and whether the system is normal or not respectively, and the fault factors of the fault factor layer are respectively as follows: system success rate, average time consumption comparability and service success rate.
S103, determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.
In this embodiment, after the state information of the fault factor and the corresponding weight coefficient are obtained, the probability of occurrence of the fault problem can be obtained based on a preset algorithm, and the preset algorithm is set according to the implementation situation.
For example, the probability of occurrence of a fault problem is calculated according to the following algorithm:
yt=w1S1+w2S2+w3S3+…+wnSn...+wNSN
wherein, ytThe probability of occurrence of the fault problem at the moment t is represented, and the value range is [0, 1%]A larger value indicates a higher possibility of failure. SnState information indicating the nth failure factor, and recording the normal state as SnWhen the abnormal state is 0, the abnormal state is recorded as Sn=1。wnA weight coefficient representing the nth fault factor, wherein N is an integer between 1 and N and satisfies w1+w2+w3+…+wn…+wN=1。
According to the method for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.
For a better understanding, the scale is briefly introduced here. A scale may be understood as quantitatively characterizing the relative importance between two factors, the scale being set according to the rank of relative importance, the higher the rank of relative importance, the larger the scale.
For example, the relative importance of the factor a and the factor B is qualitatively divided into several grades according to the actual situation, which are: equally important, slightly important, significantly important, absolutely important.
When the relative importance of the factor A and the factor B is equal to the importance, the scales of the factor A and the factor B are set to be 1, and conversely, the scales of the factor B and the factor A are set to be 1;
when the relative importance of factor A and factor B is slightly important, the scale of factor A and factor B is set to 3, and conversely, the scale of factor B and factor A is set to 1/3;
when the relative importance of factor A and factor B is significantly important, then the scale of factor A and factor B is set to 5, and conversely the scale of factor B and factor A is set to 1/5;
when the relative importance of factor a and factor B is absolutely important, the scale of factor a and factor B is set to 9, and conversely, the scale of factor B and factor a is set to 1/9.
The following example is a consistency check flow performed on the created matrix:
first, the consistency index c.i. of the created matrix is calculated according to the following formula:
Figure BDA0001904332670000061
wherein λ ismaxAnd characterizing the maximum characteristic root of the created matrix, and K characterizing the element number of the characteristic vector corresponding to the created matrix.
Secondly, look-up tables are used for determining corresponding average random consistency indexes R.I.
Table 1 shows the average random consistency index r.i obtained from 1000 matrix calculations of orders 1 to 15 for the examples.
TABLE 1
Order of the scale 3 4 5 6 7 8 9 10 11 12 13 14 15
R.I. 0.52 0.89 1.12 1.26 1.36 1.41 1.46 1.49 1.52 1.54 1.56 1.58 1.59
Again, the consistency ratio c.r. is calculated according to the following formula:
Figure BDA0001904332670000071
when the C.R. < 0.1, the consistency check of the created matrix passes;
when the C.R. ≧ 0.1, the consistency check of the created matrix fails, and the created matrix is corrected appropriately.
Wherein, the first-order matrix and the second-order matrix do not need to be subjected to consistency check.
Further, with reference to fig. 2, on the basis of the embodiment shown in fig. 1, the specific implementation manner of step S103 is:
and S1031, acquiring a weight coefficient of the mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor.
Specifically, M is 1, 2, 3.
In this embodiment, in order to calculate the weight coefficient of each control target, it is necessary to first qualitatively determine the relative importance between the control targets, then quantitatively determine the relative importance between any one of the control targets using the scale, and finally save the determined scale to the first scale table.
In one possible implementation manner, "obtaining the weighting factor of the mth control target" is implemented as follows: acquiring a pre-established first scale table, wherein the first scale table stores scales between control targets; creating a matrix according to each scale, and determining a characteristic vector of the matrix; and determining the weight coefficient of the mth control target according to the feature vector.
For example, the sub-target layer includes M-2 control targets. The scale b of the 1 st control target and the 2 nd control target is saved in the first scale table12Scaling of the 2 nd control target with the 1 st control target
Figure BDA0001904332670000072
The created matrix is then:
Figure BDA0001904332670000073
after a matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises two elements, the first element is used as the weight coefficient of the 1 st control target, and the second element is used as the weight coefficient of the 2 nd control target.
Since there are both steps of quantitatively determining the relative importance between the control targets and qualitatively determining the scale in determining the weight coefficients of the control targets, certain errors are inevitable due to the presence of both quantitative and qualitative analysis. In order to improve the accuracy of the weight coefficient of the determined control target, consistency check is carried out on the created matrix, and if the consistency check is passed, the determined weight coefficient of the control target is a target weight coefficient; if not, the determined weight coefficients of the control targets need to be adjusted, at this time, the relative importance among the control targets, the scale among the control targets and the matrix are corrected, and the weight coefficients of the control targets are recalculated according to the corrected matrix until the corrected matrix passes consistency check. See the previous example consistency check flow for how to consistency check the created matrix.
In a possible implementation manner, "obtaining the relative weight coefficient of the mth control target and each fault factor" is specifically implemented as follows:
s1, obtaining a pre-established second scale table that holds a scale between failure factors associated with the mth control target.
In this embodiment, in order to calculate the relative weight coefficient between any one of the control targets and any one of the fault factors, it is necessary to determine each fault factor affecting the control target, determine qualitatively the relative importance between each fault factor related to the control target, determine quantitatively the relative importance between the fault factors by using the scale, and finally save the determined scale to the second scale table.
S2, creating a matrix according to each scale, and determining the characteristic vector of the matrix.
And S3, determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.
For example, to determine the relative weight factor of the 1 st control target to each fault factor, first, the scale of the 1 st and 2 nd fault factors is recorded as
Figure BDA0001904332670000081
The scale of the 1 st and 3 rd failure factors is recorded as
Figure BDA0001904332670000082
Scale of 2 nd and 3 rd failure factor is noted
Figure BDA0001904332670000083
The constructed matrix is then as follows:
Figure BDA0001904332670000084
secondly, after the matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises three elements, the first element is used as the relative weight coefficient of the 1 st control target and the 1 st fault factor, the second element is used as the relative weight coefficient of the 1 st control target and the 2 nd fault factor, and the third element is used as the relative weight coefficient of the 1 st control target and the 3 rd fault factor.
Similarly, to determine the relative weight coefficients of the 2 nd control target and each fault factor, first, the scale of the 1 st and 2 nd fault factors is recorded as
Figure BDA0001904332670000085
The scale of the 1 st and 3 rd failure factors is recorded as
Figure BDA0001904332670000086
Scale of 2 nd and 3 rd failure factor is noted
Figure BDA0001904332670000087
The constructed matrix is then as follows:
Figure BDA0001904332670000088
after a matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises three elements, the first element is used as the relative weight coefficient of the 2 nd control target and the 1 st fault factor, the second element is used as the relative weight coefficient of the 2 nd control target and the 2 nd fault factor, and the third element is used as the relative weight coefficient of the 2 nd control target and the 3 rd fault factor.
Since there are both a step of quantitatively determining the relative importance of the fault factor affecting the control target and a step of qualitatively determining the scale in the process of determining the weight coefficients of the control target and the fault factor, certain errors inevitably occur due to the existence of quantitative and qualitative analysis. In order to improve the accuracy of the relative weight coefficients of the determined control target and the fault factor, consistency check is carried out on the created matrix, and if the consistency check is passed, the determined relative weight coefficients of the control target and the fault factor are taken as target relative weight coefficients; if not, the determined relative weight coefficients of the control target and the fault factors need to be adjusted, at this time, the relative importance between the fault factors of the control target and the scale between the fault factors are corrected, the matrix is further corrected, the relative weight coefficients of the control target and the fault factors are recalculated according to the corrected matrix until the corrected matrix passes consistency check. See the previous example consistency check flow for how to consistency check the created matrix.
S1032, aiming at the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor.
N is 1, 2, 3.
In a possible implementation manner, the specific implementation manner of step S1032 is: determining a weight coefficient w of the nth fault factor according to the following summation formulan
Figure BDA0001904332670000091
Wherein v ismA weight coefficient indicating an mth control target;
Figure BDA0001904332670000092
a relative weight coefficient representing the mth control target and the nth failure factor.
According to the method for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved. Meanwhile, in the process of calculating the weight coefficient of the fault factor, qualitative analysis and quantitative analysis are combined, and the decision thinking process of a decision maker on the complex object is systematized, modeled and mathematized, so that the weight coefficient of the fault factor can be calculated more scientifically and effectively, and the fault recognition rate is improved.
Fig. 3 is a schematic structural diagram of a failure problem analysis apparatus of a data center according to an embodiment of the present invention. As shown in fig. 3, includes: a monitoring module 31, a weight coefficient determination 32 and a probability calculation module 33.
A monitoring module 31, configured to monitor status information of each fault factor of the fault problem;
a weight coefficient determination module 32, configured to determine a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;
and a probability calculation module 33, configured to determine the probability of occurrence of the fault problem according to the state information and the weight coefficient of each fault factor.
Further, the apparatus further comprises:
the system comprises an establishing module, a fault problem analyzing module and a fault problem analyzing module, wherein the fault problem analyzing module is used for establishing a layered structure of the fault problem, the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.
It should be noted that the explanation of the embodiment of the method for analyzing the fault problem of the data center is also applicable to the apparatus for analyzing the fault problem of the data center of the embodiment, and details are not repeated here.
According to the device for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.
Further, referring to fig. 4 in combination, on the basis of the embodiment shown in fig. 3, the weight coefficient determining module 32 includes a first unit 321 and a second unit 322.
The first unit 321 is configured to obtain a weight coefficient of an mth control target and obtain a relative weight coefficient of the mth control target and each fault factor, where M is 1, 2, 3.
The second unit 322 is configured to, for an nth fault factor, determine a weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and a relative weight coefficient of each control target and the nth fault factor, where N is 1, 2, 3.
Further, the first unit 321 is specifically configured to:
acquiring a pre-established scale table, wherein the scale table stores the mth control target and the scale of each fault factor;
creating a matrix according to each scale, and determining a characteristic vector of the matrix;
and determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.
Further, the second unit 322 is specifically configured to:
determining a weight coefficient w of the nth fault factor according to the following formulan
Figure BDA0001904332670000101
Wherein v ismA weight coefficient indicating an mth control target;
Figure BDA0001904332670000102
a relative weight coefficient representing the mth control target and the nth failure factor.
It should be noted that the explanation of the embodiment of the method for analyzing the fault problem of the data center is also applicable to the apparatus for analyzing the fault problem of the data center of the embodiment, and details are not repeated here.
According to the device for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved. Meanwhile, in the process of calculating the weight coefficient of the fault factor, qualitative analysis and quantitative analysis are combined, and the decision thinking process of a decision maker on the complex object is systematized, modeled and mathematized, so that the weight coefficient of the fault factor can be calculated more scientifically and effectively, and the fault recognition rate is improved.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. The computer device includes:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002 executes the program to implement the method for analyzing the failure problem of the data center provided in the above-described embodiment.
Further, the computer device further comprises:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).
The processor 1002 is configured to implement the method for analyzing the failure problem of the data center according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of analyzing a failure problem of a data center as described above.
The present invention also provides a computer program product, which when executed by an instruction processor performs a method of analyzing a data center for failure problems, the method comprising:
monitoring state information of each fault factor of the fault problem;
determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;
and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (7)

1. A method for analyzing a fault problem of a data center is characterized by comprising the following steps:
monitoring the state information of each fault factor of the fault problem, wherein the state information of the fault factors comprises a normal state and an abnormal state;
determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem, wherein the fault problem is an event or phenomenon that a data center loses or reduces a specified function of the data center, and the fault factor is the fault problem with the smallest granularity, the hierarchical structure comprises a total target layer, a sub-target layer and a fault factor layer, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target;
determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor;
determining a weight coefficient of the fault factor according to a pre-established hierarchical structure of the fault problem, including:
acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3 … … M;
and for the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2 and 3 … … N.
2. The method of claim 1, wherein the determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target to the nth fault factor for the nth fault factor comprises:
determining a weight coefficient w of the nth fault factor according to the following formulan
Figure FDA0003213627630000011
Wherein v ismA weight coefficient indicating an mth control target;
Figure FDA0003213627630000012
a relative weight coefficient representing the mth control target and the nth failure factor.
3. The method of claim 1, wherein obtaining a relative weight coefficient of the mth control target to each fault factor comprises:
acquiring a pre-established second scale table which stores scales between fault factors related to the mth control target;
creating a matrix according to each scale, and determining a characteristic vector of the matrix;
and determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.
4. A failure problem analysis device of a data center is characterized by comprising:
the monitoring module is used for monitoring the state information of each fault factor of the fault problem, and the state information of the fault factors comprises a normal state and an abnormal state;
the weight coefficient determining module is used for determining the weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem, wherein the fault problem is an event or phenomenon that a data center loses or reduces a specified function of the data center, and the fault factor is the fault problem with the smallest granularity;
the probability calculation module is used for determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor;
the building module is used for building a layered structure of the fault problem, wherein the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target;
the weight coefficient determination module comprises a first unit and a second unit
The first unit is used for acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3 … … M;
and the second unit is used for determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2 and 3 … … N.
5. A computer device, comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements a method for analyzing a failure problem of a data center according to any one of claims 1 to 3.
6. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of analyzing a failure problem of a data center according to any one of claims 1 to 3.
7. A computer program product that, when executed by an instruction processor in the computer program product, performs a method of analyzing a failure problem of a data center, the method comprising:
monitoring the state information of each fault factor of the fault problem, wherein the state information of the fault factors comprises a normal state and an abnormal state;
determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem, wherein the fault problem is an event or phenomenon that a data center loses or reduces a specified function of the data center, and the fault factor is the fault problem with the smallest granularity, the hierarchical structure comprises a total target layer, a sub-target layer and a fault factor layer, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target;
determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor;
determining a weight coefficient of the fault factor according to a pre-established hierarchical structure of the fault problem, including:
acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3 … … M;
and for the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2 and 3 … … N.
CN201811525441.8A 2018-12-13 2018-12-13 Method and device for analyzing fault problem of data center Active CN109726084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811525441.8A CN109726084B (en) 2018-12-13 2018-12-13 Method and device for analyzing fault problem of data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811525441.8A CN109726084B (en) 2018-12-13 2018-12-13 Method and device for analyzing fault problem of data center

Publications (2)

Publication Number Publication Date
CN109726084A CN109726084A (en) 2019-05-07
CN109726084B true CN109726084B (en) 2021-10-15

Family

ID=66295669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811525441.8A Active CN109726084B (en) 2018-12-13 2018-12-13 Method and device for analyzing fault problem of data center

Country Status (1)

Country Link
CN (1) CN109726084B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112020087B (en) * 2019-05-30 2023-04-28 中国移动通信集团浙江有限公司 Tunnel fault monitoring method and device and computing equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490938A (en) * 2013-10-15 2014-01-01 河海大学 Layering-based cloud service combination failure recovery system and method
CN103914482A (en) * 2013-01-07 2014-07-09 上海宝信软件股份有限公司 CMDB (Configuration Management Date Base) based centralized monitoring event influence determination method
CN104486406A (en) * 2014-12-15 2015-04-01 浪潮电子信息产业股份有限公司 Layered resource monitoring method based on cloud data center
CN104915730A (en) * 2015-06-09 2015-09-16 西北工业大学 Device multi-attribute maintenance decision method based on weight
US9798598B2 (en) * 2013-11-26 2017-10-24 International Business Machines Corporation Managing faults in a high availability system
CN107465174A (en) * 2017-10-11 2017-12-12 广东电网有限责任公司电力科学研究院 A kind of fault protecting method and device of the distribution system containing energy storage

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914482A (en) * 2013-01-07 2014-07-09 上海宝信软件股份有限公司 CMDB (Configuration Management Date Base) based centralized monitoring event influence determination method
CN103490938A (en) * 2013-10-15 2014-01-01 河海大学 Layering-based cloud service combination failure recovery system and method
US9798598B2 (en) * 2013-11-26 2017-10-24 International Business Machines Corporation Managing faults in a high availability system
CN104486406A (en) * 2014-12-15 2015-04-01 浪潮电子信息产业股份有限公司 Layered resource monitoring method based on cloud data center
CN104915730A (en) * 2015-06-09 2015-09-16 西北工业大学 Device multi-attribute maintenance decision method based on weight
CN107465174A (en) * 2017-10-11 2017-12-12 广东电网有限责任公司电力科学研究院 A kind of fault protecting method and device of the distribution system containing energy storage

Also Published As

Publication number Publication date
CN109726084A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
JP6394726B2 (en) Operation management apparatus, operation management method, and program
CN108197014B (en) Fault diagnosis method and device and computer equipment
CN111722952A (en) Fault analysis method, system, equipment and storage medium of business system
CN112416891B (en) Data detection method, device, electronic equipment and readable storage medium
JP2014527214A5 (en)
CN110795324B (en) Data processing method and device
CN109040084B (en) Network flow abnormity detection method, device, equipment and storage medium
CN109726084B (en) Method and device for analyzing fault problem of data center
CN111752481A (en) Memory monitoring and service life prediction method and system based on SPD
CN113688564A (en) Method, device, terminal and storage medium for predicting remaining life of SSD (solid State disk)
US20130275816A1 (en) Identifying a dimension associated with an abnormal condition
US9397921B2 (en) Method and system for signal categorization for monitoring and detecting health changes in a database system
CN109522180B (en) Data analysis method, device and equipment based on monitoring operation and maintenance system service
CN113946983A (en) Method and device for evaluating weak links of product reliability and computer equipment
CN116127785B (en) Reliability evaluation method, device and equipment based on multiple performance degradation
CN117031294A (en) Battery multi-fault detection method, device and storage medium
CN116501464A (en) Operating system control method, device and storage medium
GB2504496A (en) Removing code instrumentation based on the comparison between collected performance data and a threshold
CN115168159A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
CN111695829B (en) Index fluctuation period calculation method and device, storage medium and electronic equipment
JP4912297B2 (en) Method for monitoring one or more physical parameters and fuel cell using the same
CN112799911A (en) Node health state detection method, device, equipment and storage medium
CN107958334A (en) A kind of method for carrying out analytical control to electric power data for power industry
CN111721542B (en) System and method for detecting faults or model mismatch
US11113360B2 (en) Plant abnormality prediction system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant