CN114124738B - Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram - Google Patents
Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram Download PDFInfo
- Publication number
- CN114124738B CN114124738B CN202111300037.2A CN202111300037A CN114124738B CN 114124738 B CN114124738 B CN 114124738B CN 202111300037 A CN202111300037 A CN 202111300037A CN 114124738 B CN114124738 B CN 114124738B
- Authority
- CN
- China
- Prior art keywords
- service
- executing
- list
- data
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 129
- 238000010586 diagram Methods 0.000 title claims abstract description 125
- 238000004364 calculation method Methods 0.000 title claims abstract description 27
- 230000008859 change Effects 0.000 claims abstract description 50
- 238000012544 monitoring process Methods 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000012216 screening Methods 0.000 claims description 24
- WBAXJMCUFIXCNI-WDSKDSINSA-N Ser-Pro Chemical compound OC[C@H](N)C(=O)N1CCC[C@H]1C(O)=O WBAXJMCUFIXCNI-WDSKDSINSA-N 0.000 claims description 16
- 108010026333 seryl-proline Proteins 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 8
- 230000035515 penetration Effects 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 10
- 238000003745 diagnosis Methods 0.000 description 6
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 210000001072 colon Anatomy 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- XZKQVQKUZMAADP-IMJSIDKUSA-N Ser-Ser Chemical compound OC[C@H](N)C(=O)N[C@@H](CO)C(O)=O XZKQVQKUZMAADP-IMJSIDKUSA-N 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Exchange Systems With Centralized Control (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram, wherein the cloud environment service fault probability calculation method comprises the steps of extracting service running state calling data in a target data set; establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time; and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram. According to the method and the device for monitoring and checking the service in the cloud computing environment, monitoring and checking of various monitoring data of the service are carried out, and the probability of service faults is effectively calculated by combining the actual running condition of the service and the calling relation between the service. The emergency measures are beneficial to the related personnel to take the emergency measures for the service with high failure probability, so that service failure caused by service failure is avoided, and inconvenience is brought to users.
Description
Technical Field
The invention relates to a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram, and belongs to the field of service fault diagnosis under cloud computing.
Background
With the development of cloud computing, more and more services are migrated from local to cloud. The characteristics of numerous components, frequent updating and complex dependency relationship constituting the cloud service lead to increased probability of failure and difficult diagnosis of the cloud service. In order to better detect the occurrence of service faults and reduce the influence caused by the service faults, the actual service operation data and the service call relationship are required to be analyzed comprehensively. Cheng L et al analyze the problems existing in service fault management and propose a noise filtering algorithm and a Bayesian-based fault diagnosis algorithm aiming at the noise and dynamic characteristics of the network. Lingwei et al analyze the dynamics of internet services, propose a priori fault probability and a fault propagation model updating method, and obtain a good diagnosis result. The Yen et al propose a graph-based micro-service analysis and test method, which analyzes and visualizes risk service call chains among micro-services through a service dependency graph, and can effectively find a fault source when a fault occurs. Friedrich et al propose a fault diagnosis method based on a finite state machine to model the behavior of a web service interface, and through calculating the similarity of interfaces, the behavior inconsistency of any two service interfaces can be analyzed, so that fault diagnosis is performed. Mayer et al propose a system for monitoring and managing micro services, which can more accurately analyze the cause of the fault occurring when the micro service system fails through multi-element information fusion.
Most of the existing researches aim at specific fault types or specific software systems, and how to judge the service occurrence fault probability under the cloud computing environment by comprehensively considering service operation data and environment data and service calling relations becomes a problem to be solved.
Disclosure of Invention
The invention provides a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram with comprehensive description service information, which are used for acquiring service fault probability in a cloud environment.
The technical scheme of the invention is as follows: a cloud environment service fault probability calculation method based on a service interaction diagram comprises the following steps:
extracting service running state calling data in a target data set;
establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
The extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
The step of establishing a service interaction diagram according to the service running state calling data and obtaining a change record of the service interaction diagram according to the current time comprises the following steps: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
According to the service interaction diagram and the change record of the service interaction diagram, calculating the fault probability of the service comprises the following steps: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
The service interaction diagram is built according to the service running state calling data, and the change record of the service interaction diagram is obtained according to the current time, and the specific steps are as follows:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
Step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
Whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
According to the service interaction diagram and the change record of the service interaction diagram, calculating the probability of service failure, wherein the method comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
Step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict for storing ADF to check the stability of NormData column g, and executing step3.12;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
Step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; step3.18 is performed;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
Step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
A cloud environment service failure probability computing system based on a service interaction graph, comprising: the extraction module is used for extracting service running state call data in the target data set; the acquisition module is used for establishing a service interaction diagram according to the service running state calling data and acquiring a change record of the service interaction diagram according to the current time; and the calculation module is used for calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
The computing module is used for traversing the service names in the running state in the service interaction diagram and sequentially reading the running data and the environment data of the service corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
A terminal comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor being configured to perform the method of any of the preceding claims.
A computer readable storage medium comprising a stored program which, when executed by a processor, causes the processor to implement a method as claimed in any one of the preceding claims.
The beneficial effects of the invention are as follows: according to the method and the device for monitoring and checking the service in the cloud computing environment, monitoring and checking of various monitoring data of the service are carried out, and the probability of service faults is effectively calculated by combining the actual running condition of the service and the calling relation between the service. The emergency measures are beneficial to the related personnel to take the emergency measures for the service with high failure probability, so that service failure caused by service failure is avoided, and inconvenience is brought to users.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a flowchart showing the Step1 of FIG. 1;
FIG. 3 is a detailed flow chart of Step2 in FIG. 1;
FIG. 4 is a second flowchart illustrating a Step2 of FIG. 1;
FIG. 5 is a detailed flow chart of Step3 in FIG. 1;
FIG. 6 is a second flowchart illustrating a Step3 of FIG. 1;
FIG. 7 is an embodiment of establishing a service interaction diagram;
Detailed Description
The invention will be further described with reference to the drawings and examples, but the invention is not limited to the scope.
Example 1: 1-6, a cloud environment service fault probability calculation method based on a service interaction diagram comprises the following steps:
extracting service running state calling data in a target data set;
establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
Optionally, the extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
Optionally, the creating a service interaction diagram according to the service running state call data, and obtaining a change record of the service interaction diagram according to the current time includes: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
Optionally, calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram includes: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
Optionally, the extracting service running state call data in the target data set specifically includes the following steps:
step1.1, reading a service call data set D monitored in a cloud environment, extracting service call data from the service call data set D, separating attributes of the service call data from each other by a colon, storing the attribute and the attribute into an initialization list ServiceDataList, initializing a temporary variable i=0, and executing step1.2; the attributes of the service call data comprise a service call number, a service user name, a service name and a service call state;
step1.2, initializing an empty list transfer list by initializing a variable ServiceStatus 1= "Working", storing service running state call data in the list, and executing step1.3;
step1.3, judging whether i is smaller than the length of the list ServiceDataList, if the condition is met, executing Step1.4, otherwise, executing Step2;
step1.4, an initialization variable serviceRunStatus is used for storing a service call state obtained by recording data with index i in a service list serviceDataList, and executing step1.5;
step1.5, judging whether the ServiceRunStatus is equal to ServiceStatus1, if the condition is met, executing step1.6, otherwise, executing step1.7;
Step1.6, recording data with index i in a list added list ServiceDataList of a list TransferList, and executing step1.7;
step1.7, i++, step1.3 is performed.
Optionally, the service interaction diagram is built according to the service running state calling data, and the change record of the service interaction diagram is obtained according to the current time, which comprises the following specific steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
Step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
Step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
Optionally, the service fault probability is obtained according to the service interaction diagram and the change record of the service interaction diagram, which comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
Step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict, which is used for storing the results of ADF checking the stability of NormData column g, including information such as probability value, confidence level, and the like, and executing step3.12; wherein the ADF test is to determine if the sequence has a root of unity: if the sequence is stable, no unit root exists; otherwise, a unit root exists;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
Step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; wherein the PCA algorithm is a dimensionality reduction algorithm that projects data into a new linear space by linear transformation, typically used to reduce the dimensionality of the data set, performing step3.18;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
Step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
Step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
Further, a cloud environment service fault probability computing system based on a service interaction graph may be provided, including: the extraction module is used for extracting service running state call data in the target data set; the acquisition module is used for establishing a service interaction diagram according to the service running state calling data and acquiring a change record of the service interaction diagram according to the current time; and the calculation module is used for calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
Optionally, the computing module is configured to traverse the service names in the running state in the service interaction diagram, and sequentially read the running data and the environment data of the services corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
Still further, a terminal may be provided comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor being configured to perform the method of any of the above.
Still further, a computer readable storage medium may be provided, the computer readable storage medium comprising a stored program which, when executed by a processor, causes the processor to implement the method of any of the above.
Example 2: 1-7, a cloud environment service fault probability calculation method based on a service interaction diagram is disclosed, wherein in the execution process, firstly, the method is sequentially traversed according to the flow of Step1-Step2-Step3 until the execution is terminated;
the method comprises the following specific steps:
the service operation data attribute table is shown in table 1, and the meaning of the service operation data attribute and the category label of the service operation data are given in the table;
table 1 service invocation data attribute table
CallId | Service call numbering |
ServiceUser | Service user name |
ServicePro | Service name |
ServiceRunStatus | Service invocation state |
Further, the method may be provided with the specific steps of:
the Step1 is specifically as follows:
step1.1, reading a service call data set D monitored in a cloud environment, extracting service call data from the service call data set D, separating attributes of the service call data from each other by a colon, storing the attribute and the attribute into an initialization list ServiceDataList, initializing a temporary variable i=0, and executing step1.2; the attributes of the service call data comprise a service call number, a service user name, a service name and a service call state;
Table 2 service invocation data set table
CallId | ServiceUser | ServicePro | ServiceRunStatus |
0 | Hotel | Cinema | Working |
1 | Hotel | Wineshop | Working |
2 | Plane | Hotel | Working |
3 | ScenicSpot | Train | End |
ServiceDataList=[0:Hotel:Cinema:Working,1:Hotel:Wineshop:Working,2:Plane:Hotel:Working,3:ScenicSpot:Train:End];
Step1.2, initializing an empty list transfer list by initializing a variable ServiceStatus 1= "Working", storing service running state call data in the list, and executing step1.3;
step1.3, judging whether i is smaller than the length of the list ServiceDataList, if the condition is met, executing Step1.4, otherwise, executing Step2;
when i=0, since the list ServiceDataList length is 4,0<4, step1.4 is performed;
when i=4, since the list ServiceDataList length is 4,4 ≡! <4, execute Step2;
step1.4, an initialization variable serviceRunStatus is used for storing a service call state obtained by recording data with index i in a service list serviceDataList, and executing step1.5;
when i=0, servicerunstatus= "Working"; when i=2, servicerunstatus= "End";
step1.5, judging whether the ServiceRunStatus is equal to ServiceStatus1, if the condition is met, executing step1.6, otherwise, executing step1.7;
when i=0, since ServiceStatus 1= "Working", servicerunstatus= "Working", step1.6 is performed;
when i=3, since ServiceStatus 1= "Working", servicerunstatus= "End", step1.7 is performed;
Step1.6, recording data with index i in a list added list ServiceDataList of a list TransferList, and executing step1.7;
when i=0, transferList adds the record "0:hotel:cinema:working";
step1.7, i++, step1.3 is performed;
by looping through Steps 1.3 through Step1.7, the list of running service calls is finally obtained as follows:
TransferList=[0:Hotel:Cinema:Working,1:Hotel:Wineshop:Working,2:Plane:Hotel:Working]。
the Step2 is specifically as follows:
step2.1, reading a service running state call data list TransferList, initializing an empty list serviceinterface for storing a service interaction diagram, wherein a temporary variable j=0, initializing a time variable time for storing the current time, initializing an empty list serviceinterface for storing a change record of the service interaction diagram, and executing step2.2; transferList= [0:Hotel:Cinema:working,1:Hotel:Wineshop:working,2:plane:Hotel:working ]; time= "2021-06-0514:02:00";
whether the Step2.2, j is smaller than the list transfer list length, if the condition is satisfied, executing the Step2.3; otherwise, the time is assigned as the current time, the time and the ServiceInterfective are added into a list ServiceInterfective list, and the step2.11 is executed;
when j=0, since the list TransferList length is 3,0<3, step2.3 is performed;
When j=3, since the list TransferList length is 3,3 ≡! <3, time= "2021-06-0514:02:07" serviceitectactivelist= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] ], execute step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service calling state, storing the service calling state into the initialization variable SerStatus, obtaining a service number, storing the service number into an initialization variable CallId, initializing h=0, and executing step2.4;
when j=0, since transferlist.get (0) = "0:hotel:cinema:working" is used, seruser= "Hotel", serpro= "Cinema", serstatus= "Working";
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
when j=0, h=0, since the list servicei technical length is 0,0 ≡! <0, step2.10 is performed;
when j=1, h=0, since the list serviceinterference length is 1,0<1, step2.5 is performed;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
When j=1, h=0, since servicel index records data having an index of h= "Hotel:0-Working (Cinema)", serviceuser1= "Hotel";
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
when j=1, h=0, since ServiceUser 1=seruser= "Hotel", step2.7 is performed;
when j= 2,h =0, since ServiceUser 1= "Hotel", seruser= "Plane", step2.9 is performed;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")") to execute step2.8;
when j=1 and h=0, the list serviceindex replaces the data with index 0 with "Hotel:0-Working (Cinema) 1-Working (Wineshop)"), and step2.8 is performed;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
when j=0, h=0, serviceinterference adds "Hotel:0-Working (Cinema)", execute step2.8;
Step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing p=0, and executing step2.12;
when the service call data monitored in real time is '4:Hotel:plane:working', serviceUser= 'Hotel', sevicePro= 'Plane', serviceStatus= 'Working', serviceCallId= '4';
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
loop execution 2.2 to 2.10, serviceinterference= [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ];
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
when the service call data monitored in real time is '4:Hotel:plane:working', executing Step2.14;
when the service call data monitored in real time is '0:Hotel:Cinema:End', executing Step2.20;
Whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
when p=0, since the length of the list servicel interface is 2,0<2, step2.15 is performed;
when p=2, since the length of the list servicel alternative is 2, 2-! <2, perform step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
when p=1, since the list servicel alternative records data having index 1= "Plane:2-Working (Hotel)", serveruser= "Plane", step2.16 is performed;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
when the service call data monitored in real time is '4:Hotel:plane:working', and p=0, serveruser=Serviceuser= 'Hotel', executing Step2.17;
when the service call data monitored in real time is '6:Wineshop: plane: working', and p=0, serveruser= 'Hotel', serviceuser= 'Wineshop', executing Step2.18;
the data with the index p recorded by the list serviceelements is replaced by the data with the index p recorded by the list serviceelements+servicestatus+ "-" servicestatus+ ", time is assigned as the current time, and the time and the serviceelements are added into the list serviceelements list; step3 is executed;
When the service call data monitored in real time is ' 4:Hotel:plane:working ', p=0, replacing the data with the index of 0 recorded by the list ServiceInteractive with Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), time= ' 2021-06-0514:02:11 ', serviceInteractevist= [ ' 2021-06-0514:02:07 ', [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] ' 2021-06-0514:02:11 ], [ Hotel:0-Working (Cinema) 1-Working (Wineshop) -Working (Plane) plane:2-Working (Hotel) ] ];
step2.18, p++, step2.14 is performed;
step2.19, serviceInteractive adds ServiceUser+ ":" +ServiceCallId+ "(" +ServiceProo+ ")") and time is assigned to the current time, and adds time and ServiceInteractive to the list ServiceInteracthlist; step3 is executed; the method comprises the steps of carrying out a first treatment on the surface of the
When the service call data monitored in real time is '6:Wineshop: plane: working', p=2, the list ServiceInteractive adds 'Wineshop: 6-Working (Plane)', time= '2021-06-0514:02:13'; serviceInterfectiveList = [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ] ];
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
when p=0, since the length of the list servicel interface is 3,0<3, step2.21 is performed;
when p=3, since the length of the list servicel alternative is 3,3 ≡! <3, perform step2.29;
step2.21, storing data with index p recorded by serviceindex, obtaining call number, initializing String type variable sequence, storing data with index p recorded by serviceindex, obtaining service user name, initializing q=0, and executing step2.22;
when p=0, since servicel index records data having an index of 0= "Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane)", servercellid= [0,1,4], serser= "Hotel", step2.22 is performed;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
when the service call data monitored in real time is '0:Hotel:Cinema:end', and p=0, executing Step2.23;
When the service call data monitored in real time is '6:Wineshop:plane:End', and p=0, serviceuser= 'Wineshop', serusser= 'Hotel', executing Step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
when p=0 and q=0, and when the service call data monitored in real time is '0:Hotel:plane:end', the length of the list ServerCallId is 3,0<3, and executing Step2.24;
when p=0, q=3, when the service call data of real-time listening is "7:hotel:cinema:end", the list servercall id length is 3, 3-! <3, execute step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
when the service call data monitored in real time is '0:Hotel:cinema:end', p=0, q=0, executing Step2.25 because the index of the ServerCallId record is 0 data= '0', serviceCallId= '0';
when the service call data monitored in real time is "7:Hotel:Cinema:end" p=0, and q=0, the data with the ServerCallId index of 0 is = "0", the ServiceCallId = "7", and the step2.26 is executed;
Step2.25, current service call data obtained using ServiceUser, servicePro, serviceStatus, serviceCallId, is used to update servicel technical;
when the service call data monitored in real time is '0:Hotel:cinema:end', p=0, q=0, since ServiceUser= 'Hotel', servicePo= 'Cinema', serviceTatus= 'End', serviceCALLId= '0', serviceInteractive= [ Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ];
step2.26, q++, step2.23 is performed;
step2.27, replacing the data with the index p recorded by the list serviceinformation with the list serviceinformation;
when the real-time listening service call data is "7:Hotel:cinema:end" p=0, q=2, the data with list ServiceINTERFACTIVE index of 0 is replaced by "Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane) 7-End (Cinema)", time= "2021-06-0514:02:15"; serviceInterfectiveList= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ], "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane) -End (Cinema), plane:2-Working (Hotel), wineshop:6-Working (Plane) ] ];
When the service call data monitored in real time is '6:Wineshop: plane: end', p=2, q=0, the data with the ServiceInterfective index of 2 is replaced by 'Wineshop: 6-End (Plane)' and time= '2021-06-0514:02:20'; serviceInterfectiveList= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) -Working (Wineshop), [ Hotel: 2-Working (Hotel) ], "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) -Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-4:02", [ Hotel:0-Working (Cinema) -3934:3956, plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema);
step2.28, p++, step2.20 is performed;
step2.29, list servicel additive serviceuser+ "-" +servicealld+ "(" +servicepro+ ") time is assigned to the current time, servicel additive is added to list servicel additive list, and Step3 is executed. The method comprises the steps of carrying out a first treatment on the surface of the
As can be seen from Step2, each time a service call data is monitored in real time, so that the service interaction diagram is changed, the Step2 is terminated, the subsequent flow is executed, and in the embodiment described in Step3, the subsequent flow is described by taking the result after the service interaction diagram is changed as an example when the time is "2021-06-0514:02:20".
The Step3 is specifically as follows:
step3.1, initializing empty list ColumnList for storing monitoring data list index, initializing g=0, v=0, executing step3.2
Step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
time=“2021-06-0514:02:20”;
ServiceInteractiveList=[“2021-06-0514:02:07”,[Hotel:0-Working(Cinema)1-Working(Wineshop),Plane:2-Working(Hotel)],“2021-06-0514:02:11”,[Hotel:0-Working(Cinema)1-Working(Wineshop)4-Working(Plane),Plane:2-Working(Hotel)],“2021-06-0514:02:13”,[Hotel:0-Working(Cinema)1-Working(Wineshop)4-Working(Plane),Plane:2-Working(Hotel),Wineshop:6-Working(Plane)],“2021-06-0514:02:15”,[Hotel:0-End(Cinema)1-Working(Wineshop)4-Working(Plane)7-End(Cinema),Plane:2-Working(Hotel),Wineshop:6-Working(Plane)]];
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
ServiceInteractive=[Hotel:0-End(Cinema)1-Working(Wineshop)4-Working(Plane)7-End(Cinema),Plane:2-Working(Hotel),Wineshop:6-End(Plane)];
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
ServiceList=[Hotel,Wineshop,Plane];
step3.5, judging whether v is smaller than the ServiceList length, if so, executing step3.6, otherwise, ending execution;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7;
(hereinafter described primarily with the operational data serving "Hotel");
table 3 monitoring index attribute table of service operation data
time | Service operation data acquisition time |
%usr | Percentage of cpu occupied by service processes in user space |
%system | Percentage of CPU occupied by service processes in kernel space |
%guest | Percentage of CPU occupied by service process in virtual machine |
%mem | Percentage of memory occupied by a service process |
kB_rd/s | KB that service processes read from disk every second |
kB_wr/s | Service processes write to disk KB every second |
Table 4Hotel service run data table
time | %usr | %system | %guest | %mem | kB_rd/s | kB_wr/s |
2021-06-0514:02:00 | 0.61 | 0.55 | 0.7 | 0.38 | 0.54 | 0.31 |
2021-06-0514:02:01 | 0.82 | 0.41 | 0.35 | 0.34 | 1.29 | 0.09 |
2021-06-0514:02:02 | 0.56 | 0.73 | 0.15 | 0.43 | 1.74 | 0.89 |
2021-06-0514:02:03 | 0.68 | 0.99 | 0.13 | 0.37 | 0.01 | 0.30 |
2021-06-0514:02:04 | 0.47 | 0.35 | 0.53 | 0.41 | 0.77 | 0.54 |
2021-06-0514:02:05 | 0.35 | 0.61 | 0.13 | 0.49 | 0.64 | 0.23 |
2021-06-0514:02:06 | 0.84 | 0.57 | 0.09 | 0.50 | 0.12 | 0.03 |
2021-06-0514:02:07 | 0.59 | 0.88 | 0.23 | 0.46 | 0.40 | 0.01 |
2021-06-0514:02:08 | 0.92 | 0.51 | 0.43 | 0.45 | 0.45 | 0.47 |
2021-06-0514:02:09 | 0.10 | 0.39 | 0.13 | 0.39 | 0.46 | 0.31 |
2021-06-0514:02:10 | 0.57 | 0.32 | 0.22 | 0.32 | 0.54 | 1.38 |
2021-06-0514:02:11 | 0.15 | 0.92 | 0.50 | 0.32 | 0.14 | 0.09 |
2021-06-0514:02:12 | 0.73 | 0.99 | 0.75 | 0.33 | 1.75 | 0.24 |
2021-06-0514:02:13 | 0.6 | 0.05 | 0.65 | 0.32 | 0.24 | 0.78 |
2021-06-0514:02:14 | 0.2 | 0.81 | 0.11 | 0.31 | 0.23 | 1.37 |
2021-06-0514:02:15 | 0.71 | 0.16 | 0.24 | 0.32 | 0.57 | 0.89 |
2021-06-0514:02:16 | 0.44 | 0.51 | 0.50 | 0.37 | 0.24 | 1.15 |
2021-06-0514:02:17 | 0.8 | 0.84 | 0.53 | 0.42 | 0.98 | 1.24 |
2021-06-0514:02:18 | 0.5 | 0.07 | 0.41 | 0.22 | 0.08 | 0.52 |
2021-06-0514:02:19 | 0.54 | 0.7 | 0.7 | 0.29 | 0.14 | 0.18 |
2021-06-0514:02:20 | 0.92 | 0.51 | 0.45 | 0.34 | 0.0 | 0.0 |
Step3.7, reading an environment data set B of the service pointed by the service name with the index v, and executing step3.8;
(hereinafter, mainly described in terms of Cpu environment data serving "Hotel", v=0);
table 5 service environment data monitoring index attribute table
TABLE 6 service Hotel Environment data Table
time | cpu_user | cpu_sys | cpu_free |
2021-06-0514:02:00 | 6 | 8 | 86 |
2021-06-0514:02:01 | 4 | 14 | 82 |
2021-06-0514:02:02 | 5 | 11 | 84 |
2021-06-0514:02:03 | 2 | 13 | 84 |
2021-06-0514:02:04 | 3 | 11 | 86 |
2021-06-0514:02:05 | 3 | 4 | 93 |
2021-06-0514:02:06 | 6 | 5 | 92 |
2021-06-0514:02:07 | 4 | 11 | 85 |
2021-06-0514:02:08 | 3 | 11 | 86 |
2021-06-0514:02:09 | 3 | 10 | 87 |
2021-06-0514:02:10 | 19 | 10 | 71 |
2021-06-0514:02:11 | 5 | 13 | 82 |
2021-06-0514:02:12 | 4 | 9 | 87 |
2021-06-0514:02:13 | 5 | 4 | 91 |
2021-06-0514:02:14 | 8 | 6 | 86 |
2021-06-0514:02:15 | 9 | 14 | 77 |
2021-06-0514:02:16 | 3 | 13 | 84 |
2021-06-0514:02:17 | 5 | 4 | 91 |
2021-06-0514:02:18 | 9 | 14 | 77 |
2021-06-0514:02:19 | 6 | 11 | 83 |
2021-06-0514:02:20 | 70 | 5 | 30 |
Step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
normdata= [ [0.61,0.55,0.7,0.38,0.54,0.31,6.0,8.0,86.0], [0.82,0.41,0.35,0.34,1.29,0.09,4.0,14.0,82.0], [0.56,0.73,0.15,0.43,1.74,0.89,5.0,11.0,84.0], [0.68,0.99,0.13,0.37,0.01,0.30,3.0,13.0,84.0], [0.47,0.35,0.53,0.41,0.77,0.54,3.0,11.0,86.0], [0.35,0.61,0.13,0.49,0.64,0.23,3.0,4.0,93.0], [0.84,0.57,0.09,0.50,0.12,0.03,3.0,5.0,92.0], [0.59,0.88,0.23,0.46,0.40,0.01,4.0,11.0,85.0], [0.92,0.51,0.43,0.45,0.45,0.47,3.0,11.0,86.0], [0.1,0.39,0.13,0.39,0.46,0.31,3.0,10.0,87.0], [0.57,0.32,0.22,0.32,0.54,1.38,19.0,10.0,71.0], [0.15,0.92,0.5,0.32,0.14,0.09,5.0,13.0,82.0], [0.73,0.99,0.75,0.33,1.75,0.57,4.0,9.0,87.0], [0.6,0.05,0.65,0.32,0.24,0.78,5.0,4.0,91.0], [0.2,0.81,0.11,0.31,0.23,1.37,8.0,6.0,86.0], [0.71,0.16,0.24,0.32,0.57,0.89,9.0,14.0,77.0], [0.44,0.51,0.5,0.37,0.24,1.15,3.0,13.0,84.0], [0.8,0.84,0.53,0.42,0.98,1.24,5.0,4.0,91.0], [0.5,0.07,0.41,0.22,0.08,0.52,9.0,14.0,77.0], [0.54,0.7,0.7,0.29,0.14,0.18,6.0,11.0,83.0] ], perform step3.9;
Step3.9, using the data set A, B to obtain the running data and environment data of the service at the time, which are denoted by the service name with the index v, storing the running data and environment data into a list abnorm data, and executing step3.10;
abnorm data= [ [0.92,0.51,0.45,0.22,0.0,0.0,70.0,5.0,30.0] ], step3.10 is performed;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict, which is used for storing the results of ADF checking the stability of NormData column g, including information such as probability value, confidence level, and the like, and executing step3.12; wherein the ADF test is to determine if the sequence has a root of unity: if the sequence is stable, no unit root exists; otherwise, a unit root exists;
when g=0, pdict= (-4.075336199834505,0.0010635440168466083,8,11, { ' 1%: 4.223238279489106, ' 5%: -3.189368925619835, ' 10%: -2.729839421487603}, -19.87455866559617);
when g=1, pdict= (33.589052234909,1.0,8,11, { '1%': -4.223238279489106, '5%': -3.189368925619835, '10%': -2.729839421487603}, -138.91275540402972);
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
When g=0, PDict [1] <0.05& & PDict [0] < PDict [4] [ 5%' ] returns True, i.e. since 0.001063554401466083 <0.05 and-4.075336199834505 < -3.189368925619835, step3.13 is performed;
when g=1, due to 1-! <0.05 and 33.589052234909-! < -3.189368925619835, execute step3.14;
step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
when g=8, columnlist= [0,2,4,5,6,7,8], step3.14 is performed;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
NormData=[[0.61,0.7,0.54,0.31,6.0,8.0,86.0],[0.82,0.35,1.29,0.09,4.0,14.0,82.0],[0.56,0.15,1.74,0.89,5.0,11.0,84.0],[0.68,0.13,0.01,0.3,3.0,13.0,84.0],[0.47,0.53,0.77,0.54,3.0,11.0,86.0],[0.35,0.13,0.64,0.23,3.0,4.0,93.0],[0.84,0.09,0.12,0.03,3.0,5.0,92.0],[0.59,0.23,0.4,0.01,4.0,11.0,85.0],[0.92,0.43,0.45,0.47,3.0,11.0,86.0],[0.1,0.13,0.46,0.31,3.0,10.0,87.0],[0.57,0.22,0.54,1.38,19.0,10.0,71.0],[0.15,0.5,0.14,0.09,5.0,13.0,82.0],[0.73,0.75,1.75,0.57,4.0,9.0,87.0],[0.6,0.65,0.24,0.78,5.0,4.0,91.0],[0.2,0.11,0.23,1.37,8.0,6.0,86.0],[0.71,0.24,0.57,0.89,9.0,14.0,77.0],[0.44,0.5,0.24,1.15,3.0,13.0,84.0],[0.8,0.53,0.98,1.24,5.0,4.0,91.0],[0.5,0.41,0.08,0.52,9.0,14.0,77.0],[0.54,0.7,0.14,0.18,6.0,11.0,83.0]];
AbnormData=[[0.92,0.45,0.0,0.0,70.0,5.0,30.0]]
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17; rank=7;
step3.17, dimension-reducing NormData into rank dimensions using PCA algorithm. Wherein the PCA algorithm is a dimensionality reduction algorithm that projects data into a new linear space by linear transformation, typically used to reduce the dimensionality of the data set, performing step3.18;
NormData=[[-1.51915662e+00,1.67839776e+00,-9.20561037e-02,-3.38842790e-01,-2.64695288e-01,-1.94106294e-01,2.02951168e-15],[3.09969325e+00,-4.21268869e+00,6.41895208e-01,-4.63350517e-01,1.04619896e-02,1.35936248e-01,1.97317363e-15],[7.99512232e-01,-1.22276523e+00,1.20472136e+00,7.61755242e-02,3.38544626e-01,7.17087567e-02,2.19326982e-15],[6.65381589e-01,-4.07184012e+00,-5.48551153e-01,1.35649016e-01,3.94163354e-02,3.03229694e-01,2.52166725e-15],[-1.71497174e+00,-2.53390341e+00,2.17091346e-01,1.34425782e-01,-3.69823257e-02,-1.61444424e-01,2.71151084e-15],[-1.00483507e+01,2.79331143e+00,-1.13096013e-01,-2.90500359e-01,3.62564786e-01,-4.41687077e-02,2.23873657e-15],[-8.85980366e+00,2.01835423e+00,-5.88929521e-01,-3.96373715e-01,2.86496876e-02,3.91458784e-01,1.89719011e-15],[-4.65831542e-01,-1.92124920e+00,-2.95541821e-01,-3.52152210e-01,1.07258662e-01,9.57465920e-02,2.04324234e-15],[-1.71486944e+00,-2.53711857e+00,-5.29178806e-02,8.06016362e-02,-2.75872203e-01,2.70897174e-01,1.89094551e-15],[-2.90592677e+00,-1.78180914e+00,-2.29068506e-01,3.92806065e-02,4.65706687e-01,-1.94198925e-01,1.91513725e-15],[1.72148406e+01,8.61868906e+00,8.86409244e-02,-1.29882178e-01,5.61703665e-02,6.53787970e-02,2.15085305e-15],[3.17340122e+00,-2.79663097e+00,-5.45576843e-01,-1.57469659e-01,1.08848189e-01,-3.83597058e-01,1.89284501e-15],[-2.84439695e+00,-3.65879298e-01,1.17415795e+00,-2.62013060e-01,-2.01017673e-01,-1.97807974e-01,2.12283801e-15],[-7.52459778e+00,4.12149037e+00,-2.84234464e-01,1.44557945e-01,-3.00385928e-01,-1.06971844e-01,2.82677042e-15],[-1.36268013e+00,4.57424522e+00,-2.20091668e-01,6.38934580e-01,3.10575735e-01,-4.54315668e-02,1.92748756e-15],[9.40110648e+00,-9.25540015e-01,1.22701671e-01,1.63800898e-01,-1.56101956e-02,2.12645340e-01,2.28739167e-15],[6.79417525e-01,-4.01643378e+00,-9.75945892e-02,9.11857218e-01,-1.35100285e-01,-5.77634972e-02,1.87225485e-15],[-7.52027715e+00,4.14895123e+00,5.64961382e-01,3.60955151e-01,-2.41847803e-01,9.65943256e-02,1.58271040e-15],[9.39692754e+00,-9.48477781e-01,-4.62117635e-01,-3.15259612e-02,-7.31925704e-02,-4.42096053e-02,2.08775882e-15],[2.05058200e+00,-6.19103091e-01,-4.84393648e-01,-2.64127908e-01,-2.83492792e-01,-2.13895816e-01,1.78453432e-15]];
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
AbnormData=[[71.23996651,45.46498957,-1.21745986,-4.89679241,-0.25414131,-0.31979014,-2.88675135]];
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
u=[2.26485497e-15,7.27196081e-16,-9.15933995e-17,-7.49400542e-17,-3.88578059e-17,4.44089210e-17,2.09749146e-15];
Step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
invCov=[[2.32302503e-02,-4.38464744e-18,5.56724225e-18,-1.51830280e-16,5.58517535e-17,6.47921316e-17,9.36298368e-01],[-4.38464744e-18,8.41187742e-02,-1.33227819e-16,-6.37438566e-17,1.43954637e-16,-1.27559453e-16,7.82404275e-01],[5.56724225e-18,-1.33227819e-16,3.56641403e+00,1.47796868e-15,-4.04874223e-16,5.96567255e-16,-9.87258228e+00],[-1.51830280e-16,-6.37438566e-17,1.47796868e-15,8.10529429e+00,6.33868403e-16,5.36172840e-15,-1.79497241e+01],[5.58517535e-17,1.43954637e-16,-4.04874223e-16,6.33868403e-16,1.84594147e+01,9.22921793e-15,-1.89707015e+01],[6.47921316e-17,-1.27559453e-16,5.96567255e-16,5.36172840e-15,9.22921793e-15,2.48556248e+01,1.44491727e+01],[9.36298368e-01,7.82404275e-01,-9.87258228e+00,-1.79497241e+01,1.89707015e+01,1.44491727e+01,1.06677334e+31]];
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
diff=[[71.23996651,45.46498957,-1.21745986,-4.89679241,-0.25414131,-0.31979014,-2.88675135]—[2.26485497e-15,7.27196081e-16,-9.15933995e-17,-7.49400542e-17,-3.88578059e-17,4.44089210e-17,2.09749146e-15];
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, service failure probability value t=diff invCov diffT, execute step3.24; t=8.8897778e+31;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
since only the edge of Plane 2-Working (Hotel) points to Hotel [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane 2-Working (Hotel) ], hotel has an entry degree of 1 in this figure. By the same token, IList= [1, 1]
Step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed; EI = 1; si=0;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
The ingress in=1, since only the edge of Plane 2-Working (Hotel) points to Hotel IN [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema), plane 2-Working (Hotel), wineshop:6-Working (Plane) ] ];
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed; f=1;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList, wherein pi is the circumference ratio, exp (-x/2) is-x/2 th power of natural logarithmic base, +inf is positive infinity, int () is an integral operation function, and int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result that x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) is positive infinite integral from the lower integral limit T/f to the upper integral limit, and step3.29 is executed;
k=1; the value range of the fault probability is 0-1;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
In the above steps, ADF inspection is used to screen the steady monitoring index during normal operation of the service before PCA dimension reduction is used. The precondition that the monitoring index deviates from the normal running condition to calculate the fault probability is that the index has a relatively stable mean value as a reference value in the normal running process of the service, so that ADF (automatic frequency filter) inspection is carried out to screen the stable monitoring index in the normal running process of the service, and the stable monitoring index in the normal running process of the screening service can effectively represent the stability of the index, thereby improving the accuracy of the calculation result. The complete collinearity of the monitoring index causes inaccurate calculation results, such as CPU occupancy rate and CPU residual rate, the covariance matrix determinant may be smaller than 0, the covariance matrix is ensured to be a semi-positive definite matrix by reducing the data into rank dimensions through a PCA algorithm, the two steps eliminate the complete collinearity of the monitoring index, and meanwhile, the calculation efficiency can be improved. And furthermore, the repeated data characteristics among the monitoring indexes can be effectively eliminated by matching with an inverse matrix of the calculated covariance matrix, and the calculated result is more accurate through optimization. By adding the service interaction graph incidence change, the incidence change is used as the user request quantity change to correct the abnormal deviation of the monitoring index, so that the service fault probability is calculated more comprehensively.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (9)
1. A cloud environment service fault probability calculation method based on a service interaction diagram is characterized by comprising the following steps of: comprising the following steps:
step1, extracting service running state call data in a target data set;
step2, a service interaction diagram is built according to service running state calling data, and a change record of the service interaction diagram is obtained according to the current time;
step3, calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram;
the service interaction diagram is built according to the service running state calling data, and a change record of the service interaction diagram is obtained according to the current time, and the method specifically comprises the following steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
If the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
Step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
Step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
Step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
2. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: the extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
3. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: the step of establishing a service interaction diagram according to the service running state calling data and obtaining a change record of the service interaction diagram according to the current time comprises the following steps: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
4. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: according to the service interaction diagram and the change record of the service interaction diagram, calculating the fault probability of the service comprises the following steps: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
5. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: according to the service interaction diagram and the change record of the service interaction diagram, calculating the probability of service failure, wherein the method comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
Step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict for storing ADF to check the stability of NormData column g, and executing step3.12;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
Step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; step3.18 is performed;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
Step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
6. A cloud environment service fault probability computing system based on a service interaction diagram is characterized in that: comprising the following steps:
an extraction module for executing Step1: extracting service running state calling data in a target data set;
an acquisition module for executing Step2: establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
A calculation module for executing Step3: calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram;
the service interaction diagram is built according to the service running state calling data, and a change record of the service interaction diagram is obtained according to the current time, and the method specifically comprises the following steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
Step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
Step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
7. The service interaction graph-based cloud environment service failure probability calculation system of claim 6, wherein: the computing module is used for traversing the service names in the running state in the service interaction diagram and sequentially reading the running data and the environment data of the service corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
8. A terminal comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, characterized by: the processor is configured to perform the method of any of claims 1-5.
9. A computer-readable storage medium including a stored program, characterized in that: the program, when executed by a processor, causes the processor to implement the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111300037.2A CN114124738B (en) | 2021-11-04 | 2021-11-04 | Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111300037.2A CN114124738B (en) | 2021-11-04 | 2021-11-04 | Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114124738A CN114124738A (en) | 2022-03-01 |
CN114124738B true CN114124738B (en) | 2024-03-19 |
Family
ID=80380518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111300037.2A Active CN114124738B (en) | 2021-11-04 | 2021-11-04 | Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114124738B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572856A (en) * | 2014-12-17 | 2015-04-29 | 武汉科技大学 | Converged storage method of service source data |
CN111737033A (en) * | 2020-05-26 | 2020-10-02 | 复旦大学 | Micro-service fault positioning method based on runtime map analysis |
CN112698975A (en) * | 2020-12-14 | 2021-04-23 | 北京大学 | Fault root cause positioning method and system of micro-service architecture information system |
CN113032238A (en) * | 2021-05-25 | 2021-06-25 | 南昌惠联网络技术有限公司 | Real-time root cause analysis method based on application knowledge graph |
-
2021
- 2021-11-04 CN CN202111300037.2A patent/CN114124738B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572856A (en) * | 2014-12-17 | 2015-04-29 | 武汉科技大学 | Converged storage method of service source data |
CN111737033A (en) * | 2020-05-26 | 2020-10-02 | 复旦大学 | Micro-service fault positioning method based on runtime map analysis |
CN112698975A (en) * | 2020-12-14 | 2021-04-23 | 北京大学 | Fault root cause positioning method and system of micro-service architecture information system |
CN113032238A (en) * | 2021-05-25 | 2021-06-25 | 南昌惠联网络技术有限公司 | Real-time root cause analysis method based on application knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN114124738A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102214297B1 (en) | Conditional validation rules | |
US9501504B2 (en) | Automatic detection of potential data quality problems | |
AU2016328959A1 (en) | Updating attribute data structures to indicate trends in attribute data provided to automated modeling systems | |
EP3418910A1 (en) | Big data-based method and device for calculating relationship between development objects | |
CN107168995B (en) | Data processing method and server | |
US9940215B2 (en) | Automatic correlation accelerator | |
CN114968727B (en) | Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance | |
CN112685324B (en) | Method and system for generating test scheme | |
US11379466B2 (en) | Data accuracy using natural language processing | |
AU2017279795A1 (en) | Metadata-driven program code generation for clinical data analysis | |
CN116126843A (en) | Data quality evaluation method and device, electronic equipment and storage medium | |
CN114124738B (en) | Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram | |
Zhang et al. | On the cost of interactions in interactive visual machine learning | |
CN114781517A (en) | Risk identification method and device and terminal equipment | |
US20210397538A1 (en) | Diagnosing application problems by learning from fault injections | |
CN114443493A (en) | Test case generation method and device, electronic equipment and storage medium | |
WO2013173422A1 (en) | Method and system for collapsing functional similarities and consolidating functionally similar, interacting systems | |
CN113656267B (en) | Device energy efficiency calculation method and device, electronic device and storage medium | |
CN114581693B (en) | User behavior mode distinguishing method and device | |
CN111752984B (en) | Information processing method, device and storage medium | |
CN114416504A (en) | Performance boundary bottleneck simulation deduction method and system for cloud computing system | |
CN116010792A (en) | Method and device for testing robustness of model | |
CN118760590A (en) | Code change tracking method and device | |
CN112115124A (en) | Data influence degree analysis method and device, electronic equipment and storage medium | |
CN114090314A (en) | Service fault propagation path judgment method and device under cloud computing environment and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |