CN109726084B

CN109726084B - Method and device for analyzing fault problem of data center

Info

Publication number: CN109726084B
Application number: CN201811525441.8A
Authority: CN
Inventors: 贾艳成; 李仲夷; 张驰; 时旭; 黄凯鸿
Original assignee: NetsUnion Clearing Corp
Current assignee: NetsUnion Clearing Corp
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2021-10-15
Anticipated expiration: 2038-12-13
Also published as: CN109726084A

Abstract

The invention provides a method and a device for analyzing a fault problem of a data center, wherein the method comprises the following steps: monitoring state information of each fault factor of the fault problem; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.

Description

Method and device for analyzing fault problem of data center

Technical Field

The invention relates to the technical field of fault identification, in particular to a method and a device for analyzing a fault problem of a data center.

Background

With the development of big data technology, the development of data centers is more and more rapid. The data center provides various services with large scale, high quality, safety and reliability for internet content providers, enterprises, media, various websites and the like. Therefore, it is important to ensure safe and reliable operation of the data center.

When the Data Center is in operation, due to system problems, hardware failures and the like, failures such as IDC (Internet Data Center) failures, bank failures, private line failures and the like occur. Therefore, it is necessary to find out the failure problem in time so as to remove the failure in the first time and quickly recover the normal operation of the data center.

When a fault problem occurs, an abnormality occurs simultaneously with a plurality of fault factors. In the related art, in order to ensure safe and reliable operation of a data center, the occurrence probability of a fault problem is calculated by monitoring the state of a fault factor occurring along with the fault problem and simultaneously adopting a mode of manually estimating a weight coefficient of the fault factor. However, by means of manually estimating the weight coefficient of the fault factor, certain errors exist due to different manual experiences, the calculated fault problem occurrence probability lacks scientific technical support, and the fault problem identification rate is reduced.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present invention is to provide an analysis method for failure problems of a data center, which is used to solve the problem of low recognition rate of failure problems in the prior art.

A second object of the present invention is to provide an apparatus for analyzing a failure problem in a data center.

A third object of the invention is to propose a computer device.

A fourth object of the invention is to propose a non-transitory computer-readable storage medium.

A fifth object of the invention is to propose a computer program product.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for analyzing a failure problem of a data center, including:

monitoring state information of each fault factor of the fault problem;

determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;

and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.

Further, the method further comprises:

and establishing a layered structure of the fault problem, wherein the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.

Further, the determining a weight coefficient of the fault factor according to the pre-established hierarchical structure of the fault problem includes:

acquiring a weight coefficient of an mth control target and a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3..... M;

and for the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2, 3.

Further, the determining, for the nth fault factor, the weighting factor of the nth fault factor according to the weighting factors of the M control targets and the relative weighting factor of each control target to the nth fault factor includes:

determining a weight coefficient w of the nth fault factor according to the following formula_n：

Wherein v is_mA weight coefficient indicating an mth control target;

a relative weight coefficient representing the mth control target and the nth failure factor.

Further, the obtaining of the relative weight coefficient of the mth control target and each fault factor includes:

acquiring a pre-established second scale table which stores scales between fault factors related to the mth control target;

creating a matrix according to each scale, and determining a characteristic vector of the matrix;

and determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.

According to the method for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.

In order to achieve the above object, a second embodiment of the present invention provides a failure problem analysis device for a data center, including:

the monitoring module is used for monitoring the state information of each fault factor of the fault problem;

the weight coefficient determining module is used for determining the weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;

and the probability calculation module is used for determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.

Further, the apparatus further comprises:

the system comprises an establishing module, a fault problem analyzing module and a fault problem analyzing module, wherein the fault problem analyzing module is used for establishing a layered structure of the fault problem, the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.

Further, the weight coefficient determination module comprises a first unit and a second unit;

the first unit is used for acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3.... M;

and the second unit is used for determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2, 3.

Further, the second unit is specifically configured to:

Wherein v is_mA weight coefficient indicating an mth control target;

Further, the first unit is specifically configured to:

According to the device for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved.

To achieve the above object, a third embodiment of the present invention provides a computer device, including: the data center fault analysis method comprises the following steps of storing a program, storing the program in a memory, and executing the program on a processor, wherein the processor executes the program to realize the data center fault analysis method.

In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for analyzing the failure problem of the data center as described above.

In order to achieve the above object, a fifth aspect of the present invention provides a computer program product, wherein when executed by an instruction processor in the computer program product, a method for analyzing a failure problem of a data center is performed, the method comprising:

monitoring state information of each fault factor of the fault problem;

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a method for analyzing a fault problem of a data center according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another method for analyzing a failure problem in a data center according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an analysis apparatus for a fault problem of a data center according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an analysis apparatus for failure problems in another data center according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a method and an apparatus for analyzing a failure problem in a data center according to an embodiment of the present invention with reference to the drawings.

Fig. 1 is a schematic flowchart of a method for analyzing a fault problem of a data center according to an embodiment of the present invention. The embodiment provides an analysis method for the fault problem of the data center, the execution subject of the analysis method is an analysis device for the fault problem of the data center, and the execution subject is composed of hardware and/or software.

As shown in fig. 1, the method for analyzing the fault problem of the data center includes:

and S101, monitoring the state information of each fault factor of the fault problem.

In this embodiment, the failure problem may be understood as an event or a phenomenon that the data center loses or reduces its specified function, and is represented as an abnormal operation of the data center. The classification of the failure problem is classified by the data center provider or the user according to the actual situation. Common fault problems for data centers include, but are not limited to, IDC faults, bank faults, private line faults, and the like.

The failure factor may be understood as the smallest granularity failure problem. The fault factors appear along with the fault problem, and the corresponding fault factors are different according to the different fault problems, and the number of the fault factors is determined by the fault problem corresponding to the fault factors.

Taking IDC faults as an example, the fault factors that accompany IDC faults include, but are not limited to, one or more of the following fault factors: system success rate, average time consumption comparability and service success rate.

Taking a bank fault as an example, the fault factors that accompany the bank fault include, but are not limited to, one or more of the following fault factors: read timeout rate, read timeout rate loop ratio, average elapsed time.

Taking the private line fault as an example, the fault factors that accompany the private line fault include, but are not limited to, one or more of the following fault factors: total number of strokes, number of failed strokes, success rate, average elapsed time.

In the present embodiment, the status information of the failure factor includes a normal state and an abnormal state. The normal state indicates that the fault factor has not occurred, and the abnormal state indicates that the fault factor has occurred. For example, taking the failure factor of the system success rate as an example, when the failure factor occurs, it indicates that an abnormality of the system success rate occurs, and when the failure factor does not occur, it indicates that an abnormality of the system success rate does not occur.

Further, in order to quantify the state information representing the failure factor, the normal state is denoted as S-0, and the abnormal state is denoted as S-1.

S102, determining a weight coefficient of the fault factor according to a pre-established hierarchical structure of the fault problem.

In the embodiment, the weight coefficient of the fault factor is calculated through a hierarchical structure, and compared with a manual experience estimation mode, the method for calculating the occurrence probability of the fault problem is supported more scientifically.

Further, before step S101, the following steps are also included:

and S104, establishing a hierarchical structure of the fault problem.

In this embodiment, the layered structure sequentially includes a total target layer, a sub-target layer, and a failure factor layer from top to bottom, where the total target layer includes a total target, the sub-target layer includes M control targets, the failure factor layer includes N failure factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target.

For example, the fault problem is an IDC fault problem, and the total target of the total target layer is an IDC fault problem; the control targets of the sub-target layer are whether the service is normal or not and whether the system is normal or not respectively, and the fault factors of the fault factor layer are respectively as follows: system success rate, average time consumption comparability and service success rate.

S103, determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor.

In this embodiment, after the state information of the fault factor and the corresponding weight coefficient are obtained, the probability of occurrence of the fault problem can be obtained based on a preset algorithm, and the preset algorithm is set according to the implementation situation.

For example, the probability of occurrence of a fault problem is calculated according to the following algorithm:

y_t＝w₁S₁+w₂S₂+w₃S₃+…+w_nS_n...+w_NS_N；

wherein, y_tThe probability of occurrence of the fault problem at the moment t is represented, and the value range is [0, 1%]A larger value indicates a higher possibility of failure. S_nState information indicating the nth failure factor, and recording the normal state as S_nWhen the abnormal state is 0, the abnormal state is recorded as S_n＝1。w_nA weight coefficient representing the nth fault factor, wherein N is an integer between 1 and N and satisfies w₁+w₂+w₃+…+w_n…+w_N＝1。

For a better understanding, the scale is briefly introduced here. A scale may be understood as quantitatively characterizing the relative importance between two factors, the scale being set according to the rank of relative importance, the higher the rank of relative importance, the larger the scale.

For example, the relative importance of the factor a and the factor B is qualitatively divided into several grades according to the actual situation, which are: equally important, slightly important, significantly important, absolutely important.

When the relative importance of the factor A and the factor B is equal to the importance, the scales of the factor A and the factor B are set to be 1, and conversely, the scales of the factor B and the factor A are set to be 1;

when the relative importance of factor A and factor B is slightly important, the scale of factor A and factor B is set to 3, and conversely, the scale of factor B and factor A is set to 1/3;

when the relative importance of factor A and factor B is significantly important, then the scale of factor A and factor B is set to 5, and conversely the scale of factor B and factor A is set to 1/5;

when the relative importance of factor a and factor B is absolutely important, the scale of factor a and factor B is set to 9, and conversely, the scale of factor B and factor a is set to 1/9.

The following example is a consistency check flow performed on the created matrix:

first, the consistency index c.i. of the created matrix is calculated according to the following formula:

wherein λ is_maxAnd characterizing the maximum characteristic root of the created matrix, and K characterizing the element number of the characteristic vector corresponding to the created matrix.

Secondly, look-up tables are used for determining corresponding average random consistency indexes R.I.

Table 1 shows the average random consistency index r.i obtained from 1000 matrix calculations of orders 1 to 15 for the examples.

TABLE 1

Order of the scale	3	4	5	6	7	8	9	10	11	12	13	14	15
														R.I.	0.52	0.89	1.12	1.26	1.36	1.41	1.46	1.49	1.52	1.54	1.56	1.58	1.59

Again, the consistency ratio c.r. is calculated according to the following formula:

when the C.R. < 0.1, the consistency check of the created matrix passes;

when the C.R. ≧ 0.1, the consistency check of the created matrix fails, and the created matrix is corrected appropriately.

Wherein, the first-order matrix and the second-order matrix do not need to be subjected to consistency check.

Further, with reference to fig. 2, on the basis of the embodiment shown in fig. 1, the specific implementation manner of step S103 is:

and S1031, acquiring a weight coefficient of the mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor.

Specifically, M is 1, 2, 3.

In this embodiment, in order to calculate the weight coefficient of each control target, it is necessary to first qualitatively determine the relative importance between the control targets, then quantitatively determine the relative importance between any one of the control targets using the scale, and finally save the determined scale to the first scale table.

In one possible implementation manner, "obtaining the weighting factor of the mth control target" is implemented as follows: acquiring a pre-established first scale table, wherein the first scale table stores scales between control targets; creating a matrix according to each scale, and determining a characteristic vector of the matrix; and determining the weight coefficient of the mth control target according to the feature vector.

For example, the sub-target layer includes M-2 control targets. The scale b of the 1 st control target and the 2 nd control target is saved in the first scale table₁₂Scaling of the 2 nd control target with the 1 st control target

The created matrix is then:

after a matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises two elements, the first element is used as the weight coefficient of the 1 st control target, and the second element is used as the weight coefficient of the 2 nd control target.

Since there are both steps of quantitatively determining the relative importance between the control targets and qualitatively determining the scale in determining the weight coefficients of the control targets, certain errors are inevitable due to the presence of both quantitative and qualitative analysis. In order to improve the accuracy of the weight coefficient of the determined control target, consistency check is carried out on the created matrix, and if the consistency check is passed, the determined weight coefficient of the control target is a target weight coefficient; if not, the determined weight coefficients of the control targets need to be adjusted, at this time, the relative importance among the control targets, the scale among the control targets and the matrix are corrected, and the weight coefficients of the control targets are recalculated according to the corrected matrix until the corrected matrix passes consistency check. See the previous example consistency check flow for how to consistency check the created matrix.

In a possible implementation manner, "obtaining the relative weight coefficient of the mth control target and each fault factor" is specifically implemented as follows:

s1, obtaining a pre-established second scale table that holds a scale between failure factors associated with the mth control target.

In this embodiment, in order to calculate the relative weight coefficient between any one of the control targets and any one of the fault factors, it is necessary to determine each fault factor affecting the control target, determine qualitatively the relative importance between each fault factor related to the control target, determine quantitatively the relative importance between the fault factors by using the scale, and finally save the determined scale to the second scale table.

S2, creating a matrix according to each scale, and determining the characteristic vector of the matrix.

And S3, determining the relative weight coefficient of the mth control target and each fault factor according to the feature vector.

For example, to determine the relative weight factor of the 1 st control target to each fault factor, first, the scale of the 1 st and 2 nd fault factors is recorded as

The scale of the 1 st and 3 rd failure factors is recorded as

Scale of 2 nd and 3 rd failure factor is noted

The constructed matrix is then as follows:

secondly, after the matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises three elements, the first element is used as the relative weight coefficient of the 1 st control target and the 1 st fault factor, the second element is used as the relative weight coefficient of the 1 st control target and the 2 nd fault factor, and the third element is used as the relative weight coefficient of the 1 st control target and the 3 rd fault factor.

Similarly, to determine the relative weight coefficients of the 2 nd control target and each fault factor, first, the scale of the 1 st and 2 nd fault factors is recorded as

The scale of the 1 st and 3 rd failure factors is recorded as

Scale of 2 nd and 3 rd failure factor is noted

The constructed matrix is then as follows:

after a matrix is constructed, the eigenvector of the matrix is obtained, the obtained eigenvector comprises three elements, the first element is used as the relative weight coefficient of the 2 nd control target and the 1 st fault factor, the second element is used as the relative weight coefficient of the 2 nd control target and the 2 nd fault factor, and the third element is used as the relative weight coefficient of the 2 nd control target and the 3 rd fault factor.

Since there are both a step of quantitatively determining the relative importance of the fault factor affecting the control target and a step of qualitatively determining the scale in the process of determining the weight coefficients of the control target and the fault factor, certain errors inevitably occur due to the existence of quantitative and qualitative analysis. In order to improve the accuracy of the relative weight coefficients of the determined control target and the fault factor, consistency check is carried out on the created matrix, and if the consistency check is passed, the determined relative weight coefficients of the control target and the fault factor are taken as target relative weight coefficients; if not, the determined relative weight coefficients of the control target and the fault factors need to be adjusted, at this time, the relative importance between the fault factors of the control target and the scale between the fault factors are corrected, the matrix is further corrected, the relative weight coefficients of the control target and the fault factors are recalculated according to the corrected matrix until the corrected matrix passes consistency check. See the previous example consistency check flow for how to consistency check the created matrix.

S1032, aiming at the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor.

N is 1, 2, 3.

In a possible implementation manner, the specific implementation manner of step S1032 is: determining a weight coefficient w of the nth fault factor according to the following summation formula_n：

Wherein v is_mA weight coefficient indicating an mth control target;

According to the method for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved. Meanwhile, in the process of calculating the weight coefficient of the fault factor, qualitative analysis and quantitative analysis are combined, and the decision thinking process of a decision maker on the complex object is systematized, modeled and mathematized, so that the weight coefficient of the fault factor can be calculated more scientifically and effectively, and the fault recognition rate is improved.

Fig. 3 is a schematic structural diagram of a failure problem analysis apparatus of a data center according to an embodiment of the present invention. As shown in fig. 3, includes: a monitoring module 31, a weight coefficient determination 32 and a probability calculation module 33.

A monitoring module 31, configured to monitor status information of each fault factor of the fault problem;

a weight coefficient determination module 32, configured to determine a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem;

and a probability calculation module 33, configured to determine the probability of occurrence of the fault problem according to the state information and the weight coefficient of each fault factor.

Further, the apparatus further comprises:

It should be noted that the explanation of the embodiment of the method for analyzing the fault problem of the data center is also applicable to the apparatus for analyzing the fault problem of the data center of the embodiment, and details are not repeated here.

Further, referring to fig. 4 in combination, on the basis of the embodiment shown in fig. 3, the weight coefficient determining module 32 includes a first unit 321 and a second unit 322.

The first unit 321 is configured to obtain a weight coefficient of an mth control target and obtain a relative weight coefficient of the mth control target and each fault factor, where M is 1, 2, 3.

The second unit 322 is configured to, for an nth fault factor, determine a weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and a relative weight coefficient of each control target and the nth fault factor, where N is 1, 2, 3.

Further, the first unit 321 is specifically configured to:

acquiring a pre-established scale table, wherein the scale table stores the mth control target and the scale of each fault factor;

Further, the second unit 322 is specifically configured to:

Wherein v is_mA weight coefficient indicating an mth control target;

According to the device for analyzing the fault problem of the data center, provided by the embodiment of the invention, the state information of each fault factor of the fault problem is monitored; determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem; and determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor. Because the weight coefficient of the fault factor is calculated by adopting the hierarchical structure of the fault problem, compared with a mode of estimating the weight coefficient of the fault factor by manual experience, the method has the advantages that the occurrence probability of the fault problem is more scientifically supported, and the recognition rate of the fault problem is improved. Meanwhile, in the process of calculating the weight coefficient of the fault factor, qualitative analysis and quantitative analysis are combined, and the decision thinking process of a decision maker on the complex object is systematized, modeled and mathematized, so that the weight coefficient of the fault factor can be calculated more scientifically and effectively, and the fault recognition rate is improved.

Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. The computer device includes:

memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.

The processor 1002 executes the program to implement the method for analyzing the failure problem of the data center provided in the above-described embodiment.

Further, the computer device further comprises:

a communication interface 1003 for communicating between the memory 1001 and the processor 1002.

A memory 1001 for storing computer programs that may be run on the processor 1002.

Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).

The processor 1002 is configured to implement the method for analyzing the failure problem of the data center according to the foregoing embodiment when executing the program.

If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.

The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.

The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of analyzing a failure problem of a data center as described above.

The present invention also provides a computer program product, which when executed by an instruction processor performs a method of analyzing a data center for failure problems, the method comprising:

monitoring state information of each fault factor of the fault problem;

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for analyzing a fault problem of a data center is characterized by comprising the following steps:

monitoring the state information of each fault factor of the fault problem, wherein the state information of the fault factors comprises a normal state and an abnormal state;

determining a weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem, wherein the fault problem is an event or phenomenon that a data center loses or reduces a specified function of the data center, and the fault factor is the fault problem with the smallest granularity, the hierarchical structure comprises a total target layer, a sub-target layer and a fault factor layer, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target;

determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor;

determining a weight coefficient of the fault factor according to a pre-established hierarchical structure of the fault problem, including:

acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3 … … M;

and for the nth fault factor, determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2 and 3 … … N.

2. The method of claim 1, wherein the determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target to the nth fault factor for the nth fault factor comprises:

Wherein v is_mA weight coefficient indicating an mth control target;

3. The method of claim 1, wherein obtaining a relative weight coefficient of the mth control target to each fault factor comprises:

4. A failure problem analysis device of a data center is characterized by comprising:

the monitoring module is used for monitoring the state information of each fault factor of the fault problem, and the state information of the fault factors comprises a normal state and an abnormal state;

the weight coefficient determining module is used for determining the weight coefficient of each fault factor according to a pre-established hierarchical structure of the fault problem, wherein the fault problem is an event or phenomenon that a data center loses or reduces a specified function of the data center, and the fault factor is the fault problem with the smallest granularity;

the probability calculation module is used for determining the occurrence probability of the fault problem according to the state information and the weight coefficient of each fault factor;

the building module is used for building a layered structure of the fault problem, wherein the layered structure sequentially comprises a total target layer, a sub-target layer and a fault factor layer from top to bottom, the total target layer comprises a total target, the sub-target layer comprises M control targets, the fault factor layer comprises N fault factors, M, N are positive integers, and the M control targets are obtained by subdividing the total target;

the weight coefficient determination module comprises a first unit and a second unit

The first unit is used for acquiring a weight coefficient of an mth control target and acquiring a relative weight coefficient of the mth control target and each fault factor, wherein M is 1, 2, 3 … … M;

and the second unit is used for determining the weight coefficient of the nth fault factor according to the weight coefficients of the M control targets and the relative weight coefficient of each control target and the nth fault factor, wherein N is 1, 2 and 3 … … N.

5. A computer device, comprising:

memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements a method for analyzing a failure problem of a data center according to any one of claims 1 to 3.

6. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of analyzing a failure problem of a data center according to any one of claims 1 to 3.

7. A computer program product that, when executed by an instruction processor in the computer program product, performs a method of analyzing a failure problem of a data center, the method comprising: