CN114124738B - Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram - Google Patents

Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram Download PDF

Info

Publication number
CN114124738B
CN114124738B CN202111300037.2A CN202111300037A CN114124738B CN 114124738 B CN114124738 B CN 114124738B CN 202111300037 A CN202111300037 A CN 202111300037A CN 114124738 B CN114124738 B CN 114124738B
Authority
CN
China
Prior art keywords
service
executing
list
data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111300037.2A
Other languages
Chinese (zh)
Other versions
CN114124738A (en
Inventor
姜瑛
李荣宸
姒鉴哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202111300037.2A priority Critical patent/CN114124738B/en
Publication of CN114124738A publication Critical patent/CN114124738A/en
Application granted granted Critical
Publication of CN114124738B publication Critical patent/CN114124738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Exchange Systems With Centralized Control (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram, wherein the cloud environment service fault probability calculation method comprises the steps of extracting service running state calling data in a target data set; establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time; and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram. According to the method and the device for monitoring and checking the service in the cloud computing environment, monitoring and checking of various monitoring data of the service are carried out, and the probability of service faults is effectively calculated by combining the actual running condition of the service and the calling relation between the service. The emergency measures are beneficial to the related personnel to take the emergency measures for the service with high failure probability, so that service failure caused by service failure is avoided, and inconvenience is brought to users.

Description

Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram
Technical Field
The invention relates to a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram, and belongs to the field of service fault diagnosis under cloud computing.
Background
With the development of cloud computing, more and more services are migrated from local to cloud. The characteristics of numerous components, frequent updating and complex dependency relationship constituting the cloud service lead to increased probability of failure and difficult diagnosis of the cloud service. In order to better detect the occurrence of service faults and reduce the influence caused by the service faults, the actual service operation data and the service call relationship are required to be analyzed comprehensively. Cheng L et al analyze the problems existing in service fault management and propose a noise filtering algorithm and a Bayesian-based fault diagnosis algorithm aiming at the noise and dynamic characteristics of the network. Lingwei et al analyze the dynamics of internet services, propose a priori fault probability and a fault propagation model updating method, and obtain a good diagnosis result. The Yen et al propose a graph-based micro-service analysis and test method, which analyzes and visualizes risk service call chains among micro-services through a service dependency graph, and can effectively find a fault source when a fault occurs. Friedrich et al propose a fault diagnosis method based on a finite state machine to model the behavior of a web service interface, and through calculating the similarity of interfaces, the behavior inconsistency of any two service interfaces can be analyzed, so that fault diagnosis is performed. Mayer et al propose a system for monitoring and managing micro services, which can more accurately analyze the cause of the fault occurring when the micro service system fails through multi-element information fusion.
Most of the existing researches aim at specific fault types or specific software systems, and how to judge the service occurrence fault probability under the cloud computing environment by comprehensively considering service operation data and environment data and service calling relations becomes a problem to be solved.
Disclosure of Invention
The invention provides a cloud environment service fault probability calculation method, a cloud environment service fault probability calculation system and a cloud environment service fault probability calculation terminal based on a service interaction diagram with comprehensive description service information, which are used for acquiring service fault probability in a cloud environment.
The technical scheme of the invention is as follows: a cloud environment service fault probability calculation method based on a service interaction diagram comprises the following steps:
extracting service running state calling data in a target data set;
establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
The extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
The step of establishing a service interaction diagram according to the service running state calling data and obtaining a change record of the service interaction diagram according to the current time comprises the following steps: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
According to the service interaction diagram and the change record of the service interaction diagram, calculating the fault probability of the service comprises the following steps: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
The service interaction diagram is built according to the service running state calling data, and the change record of the service interaction diagram is obtained according to the current time, and the specific steps are as follows:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
Step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
Whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
According to the service interaction diagram and the change record of the service interaction diagram, calculating the probability of service failure, wherein the method comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
Step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict for storing ADF to check the stability of NormData column g, and executing step3.12;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
Step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; step3.18 is performed;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
Step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
A cloud environment service failure probability computing system based on a service interaction graph, comprising: the extraction module is used for extracting service running state call data in the target data set; the acquisition module is used for establishing a service interaction diagram according to the service running state calling data and acquiring a change record of the service interaction diagram according to the current time; and the calculation module is used for calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
The computing module is used for traversing the service names in the running state in the service interaction diagram and sequentially reading the running data and the environment data of the service corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
A terminal comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor being configured to perform the method of any of the preceding claims.
A computer readable storage medium comprising a stored program which, when executed by a processor, causes the processor to implement a method as claimed in any one of the preceding claims.
The beneficial effects of the invention are as follows: according to the method and the device for monitoring and checking the service in the cloud computing environment, monitoring and checking of various monitoring data of the service are carried out, and the probability of service faults is effectively calculated by combining the actual running condition of the service and the calling relation between the service. The emergency measures are beneficial to the related personnel to take the emergency measures for the service with high failure probability, so that service failure caused by service failure is avoided, and inconvenience is brought to users.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a flowchart showing the Step1 of FIG. 1;
FIG. 3 is a detailed flow chart of Step2 in FIG. 1;
FIG. 4 is a second flowchart illustrating a Step2 of FIG. 1;
FIG. 5 is a detailed flow chart of Step3 in FIG. 1;
FIG. 6 is a second flowchart illustrating a Step3 of FIG. 1;
FIG. 7 is an embodiment of establishing a service interaction diagram;
Detailed Description
The invention will be further described with reference to the drawings and examples, but the invention is not limited to the scope.
Example 1: 1-6, a cloud environment service fault probability calculation method based on a service interaction diagram comprises the following steps:
extracting service running state calling data in a target data set;
establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
and calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
Optionally, the extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
Optionally, the creating a service interaction diagram according to the service running state call data, and obtaining a change record of the service interaction diagram according to the current time includes: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
Optionally, calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram includes: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
Optionally, the extracting service running state call data in the target data set specifically includes the following steps:
step1.1, reading a service call data set D monitored in a cloud environment, extracting service call data from the service call data set D, separating attributes of the service call data from each other by a colon, storing the attribute and the attribute into an initialization list ServiceDataList, initializing a temporary variable i=0, and executing step1.2; the attributes of the service call data comprise a service call number, a service user name, a service name and a service call state;
step1.2, initializing an empty list transfer list by initializing a variable ServiceStatus 1= "Working", storing service running state call data in the list, and executing step1.3;
step1.3, judging whether i is smaller than the length of the list ServiceDataList, if the condition is met, executing Step1.4, otherwise, executing Step2;
step1.4, an initialization variable serviceRunStatus is used for storing a service call state obtained by recording data with index i in a service list serviceDataList, and executing step1.5;
step1.5, judging whether the ServiceRunStatus is equal to ServiceStatus1, if the condition is met, executing step1.6, otherwise, executing step1.7;
Step1.6, recording data with index i in a list added list ServiceDataList of a list TransferList, and executing step1.7;
step1.7, i++, step1.3 is performed.
Optionally, the service interaction diagram is built according to the service running state calling data, and the change record of the service interaction diagram is obtained according to the current time, which comprises the following specific steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
Step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
Step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
Optionally, the service fault probability is obtained according to the service interaction diagram and the change record of the service interaction diagram, which comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
Step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict, which is used for storing the results of ADF checking the stability of NormData column g, including information such as probability value, confidence level, and the like, and executing step3.12; wherein the ADF test is to determine if the sequence has a root of unity: if the sequence is stable, no unit root exists; otherwise, a unit root exists;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
Step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; wherein the PCA algorithm is a dimensionality reduction algorithm that projects data into a new linear space by linear transformation, typically used to reduce the dimensionality of the data set, performing step3.18;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
Step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
Step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
Further, a cloud environment service fault probability computing system based on a service interaction graph may be provided, including: the extraction module is used for extracting service running state call data in the target data set; the acquisition module is used for establishing a service interaction diagram according to the service running state calling data and acquiring a change record of the service interaction diagram according to the current time; and the calculation module is used for calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram.
Optionally, the computing module is configured to traverse the service names in the running state in the service interaction diagram, and sequentially read the running data and the environment data of the services corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
Still further, a terminal may be provided comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor being configured to perform the method of any of the above.
Still further, a computer readable storage medium may be provided, the computer readable storage medium comprising a stored program which, when executed by a processor, causes the processor to implement the method of any of the above.
Example 2: 1-7, a cloud environment service fault probability calculation method based on a service interaction diagram is disclosed, wherein in the execution process, firstly, the method is sequentially traversed according to the flow of Step1-Step2-Step3 until the execution is terminated;
the method comprises the following specific steps:
the service operation data attribute table is shown in table 1, and the meaning of the service operation data attribute and the category label of the service operation data are given in the table;
table 1 service invocation data attribute table
CallId Service call numbering
ServiceUser Service user name
ServicePro Service name
ServiceRunStatus Service invocation state
Further, the method may be provided with the specific steps of:
the Step1 is specifically as follows:
step1.1, reading a service call data set D monitored in a cloud environment, extracting service call data from the service call data set D, separating attributes of the service call data from each other by a colon, storing the attribute and the attribute into an initialization list ServiceDataList, initializing a temporary variable i=0, and executing step1.2; the attributes of the service call data comprise a service call number, a service user name, a service name and a service call state;
Table 2 service invocation data set table
CallId ServiceUser ServicePro ServiceRunStatus
0 Hotel Cinema Working
1 Hotel Wineshop Working
2 Plane Hotel Working
3 ScenicSpot Train End
ServiceDataList=[0:Hotel:Cinema:Working,1:Hotel:Wineshop:Working,2:Plane:Hotel:Working,3:ScenicSpot:Train:End];
Step1.2, initializing an empty list transfer list by initializing a variable ServiceStatus 1= "Working", storing service running state call data in the list, and executing step1.3;
step1.3, judging whether i is smaller than the length of the list ServiceDataList, if the condition is met, executing Step1.4, otherwise, executing Step2;
when i=0, since the list ServiceDataList length is 4,0<4, step1.4 is performed;
when i=4, since the list ServiceDataList length is 4,4 ≡! <4, execute Step2;
step1.4, an initialization variable serviceRunStatus is used for storing a service call state obtained by recording data with index i in a service list serviceDataList, and executing step1.5;
when i=0, servicerunstatus= "Working"; when i=2, servicerunstatus= "End";
step1.5, judging whether the ServiceRunStatus is equal to ServiceStatus1, if the condition is met, executing step1.6, otherwise, executing step1.7;
when i=0, since ServiceStatus 1= "Working", servicerunstatus= "Working", step1.6 is performed;
when i=3, since ServiceStatus 1= "Working", servicerunstatus= "End", step1.7 is performed;
Step1.6, recording data with index i in a list added list ServiceDataList of a list TransferList, and executing step1.7;
when i=0, transferList adds the record "0:hotel:cinema:working";
step1.7, i++, step1.3 is performed;
by looping through Steps 1.3 through Step1.7, the list of running service calls is finally obtained as follows:
TransferList=[0:Hotel:Cinema:Working,1:Hotel:Wineshop:Working,2:Plane:Hotel:Working]。
the Step2 is specifically as follows:
step2.1, reading a service running state call data list TransferList, initializing an empty list serviceinterface for storing a service interaction diagram, wherein a temporary variable j=0, initializing a time variable time for storing the current time, initializing an empty list serviceinterface for storing a change record of the service interaction diagram, and executing step2.2; transferList= [0:Hotel:Cinema:working,1:Hotel:Wineshop:working,2:plane:Hotel:working ]; time= "2021-06-0514:02:00";
whether the Step2.2, j is smaller than the list transfer list length, if the condition is satisfied, executing the Step2.3; otherwise, the time is assigned as the current time, the time and the ServiceInterfective are added into a list ServiceInterfective list, and the step2.11 is executed;
when j=0, since the list TransferList length is 3,0<3, step2.3 is performed;
When j=3, since the list TransferList length is 3,3 ≡! <3, time= "2021-06-0514:02:07" serviceitectactivelist= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] ], execute step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service calling state, storing the service calling state into the initialization variable SerStatus, obtaining a service number, storing the service number into an initialization variable CallId, initializing h=0, and executing step2.4;
when j=0, since transferlist.get (0) = "0:hotel:cinema:working" is used, seruser= "Hotel", serpro= "Cinema", serstatus= "Working";
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
when j=0, h=0, since the list servicei technical length is 0,0 ≡! <0, step2.10 is performed;
when j=1, h=0, since the list serviceinterference length is 1,0<1, step2.5 is performed;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
When j=1, h=0, since servicel index records data having an index of h= "Hotel:0-Working (Cinema)", serviceuser1= "Hotel";
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
when j=1, h=0, since ServiceUser 1=seruser= "Hotel", step2.7 is performed;
when j= 2,h =0, since ServiceUser 1= "Hotel", seruser= "Plane", step2.9 is performed;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")") to execute step2.8;
when j=1 and h=0, the list serviceindex replaces the data with index 0 with "Hotel:0-Working (Cinema) 1-Working (Wineshop)"), and step2.8 is performed;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
when j=0, h=0, serviceinterference adds "Hotel:0-Working (Cinema)", execute step2.8;
Step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing p=0, and executing step2.12;
when the service call data monitored in real time is '4:Hotel:plane:working', serviceUser= 'Hotel', sevicePro= 'Plane', serviceStatus= 'Working', serviceCallId= '4';
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
loop execution 2.2 to 2.10, serviceinterference= [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ];
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
when the service call data monitored in real time is '4:Hotel:plane:working', executing Step2.14;
when the service call data monitored in real time is '0:Hotel:Cinema:End', executing Step2.20;
Whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
when p=0, since the length of the list servicel interface is 2,0<2, step2.15 is performed;
when p=2, since the length of the list servicel alternative is 2, 2-! <2, perform step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
when p=1, since the list servicel alternative records data having index 1= "Plane:2-Working (Hotel)", serveruser= "Plane", step2.16 is performed;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
when the service call data monitored in real time is '4:Hotel:plane:working', and p=0, serveruser=Serviceuser= 'Hotel', executing Step2.17;
when the service call data monitored in real time is '6:Wineshop: plane: working', and p=0, serveruser= 'Hotel', serviceuser= 'Wineshop', executing Step2.18;
the data with the index p recorded by the list serviceelements is replaced by the data with the index p recorded by the list serviceelements+servicestatus+ "-" servicestatus+ ", time is assigned as the current time, and the time and the serviceelements are added into the list serviceelements list; step3 is executed;
When the service call data monitored in real time is ' 4:Hotel:plane:working ', p=0, replacing the data with the index of 0 recorded by the list ServiceInteractive with Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), time= ' 2021-06-0514:02:11 ', serviceInteractevist= [ ' 2021-06-0514:02:07 ', [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] ' 2021-06-0514:02:11 ], [ Hotel:0-Working (Cinema) 1-Working (Wineshop) -Working (Plane) plane:2-Working (Hotel) ] ];
step2.18, p++, step2.14 is performed;
step2.19, serviceInteractive adds ServiceUser+ ":" +ServiceCallId+ "(" +ServiceProo+ ")") and time is assigned to the current time, and adds time and ServiceInteractive to the list ServiceInteracthlist; step3 is executed; the method comprises the steps of carrying out a first treatment on the surface of the
When the service call data monitored in real time is '6:Wineshop: plane: working', p=2, the list ServiceInteractive adds 'Wineshop: 6-Working (Plane)', time= '2021-06-0514:02:13'; serviceInterfectiveList = [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ] "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ] ];
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
when p=0, since the length of the list servicel interface is 3,0<3, step2.21 is performed;
when p=3, since the length of the list servicel alternative is 3,3 ≡! <3, perform step2.29;
step2.21, storing data with index p recorded by serviceindex, obtaining call number, initializing String type variable sequence, storing data with index p recorded by serviceindex, obtaining service user name, initializing q=0, and executing step2.22;
when p=0, since servicel index records data having an index of 0= "Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane)", servercellid= [0,1,4], serser= "Hotel", step2.22 is performed;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
when the service call data monitored in real time is '0:Hotel:Cinema:end', and p=0, executing Step2.23;
When the service call data monitored in real time is '6:Wineshop:plane:End', and p=0, serviceuser= 'Wineshop', serusser= 'Hotel', executing Step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
when p=0 and q=0, and when the service call data monitored in real time is '0:Hotel:plane:end', the length of the list ServerCallId is 3,0<3, and executing Step2.24;
when p=0, q=3, when the service call data of real-time listening is "7:hotel:cinema:end", the list servercall id length is 3, 3-! <3, execute step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
when the service call data monitored in real time is '0:Hotel:cinema:end', p=0, q=0, executing Step2.25 because the index of the ServerCallId record is 0 data= '0', serviceCallId= '0';
when the service call data monitored in real time is "7:Hotel:Cinema:end" p=0, and q=0, the data with the ServerCallId index of 0 is = "0", the ServiceCallId = "7", and the step2.26 is executed;
Step2.25, current service call data obtained using ServiceUser, servicePro, serviceStatus, serviceCallId, is used to update servicel technical;
when the service call data monitored in real time is '0:Hotel:cinema:end', p=0, q=0, since ServiceUser= 'Hotel', servicePo= 'Cinema', serviceTatus= 'End', serviceCALLId= '0', serviceInteractive= [ Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ];
step2.26, q++, step2.23 is performed;
step2.27, replacing the data with the index p recorded by the list serviceinformation with the list serviceinformation;
when the real-time listening service call data is "7:Hotel:cinema:end" p=0, q=2, the data with list ServiceINTERFACTIVE index of 0 is replaced by "Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane) 7-End (Cinema)", time= "2021-06-0514:02:15"; serviceInterfectiveList= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane:2-Working (Hotel) ], "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) 4-Working (Plane) -End (Cinema), plane:2-Working (Hotel), wineshop:6-Working (Plane) ] ];
When the service call data monitored in real time is '6:Wineshop: plane: end', p=2, q=0, the data with the ServiceInterfective index of 2 is replaced by 'Wineshop: 6-End (Plane)' and time= '2021-06-0514:02:20'; serviceInterfectiveList= [ "2021-06-0514:02:07", [ Hotel:0-Working (Cinema) -Working (Wineshop), [ Hotel: 2-Working (Hotel) ], "2021-06-0514:02:11", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) 4-Working (Plane), plane:2-Working (Hotel) ], "2021-06-0514:02:13", [ Hotel:0-Working (Cinema) 1-Working (Wineshop) -Working (Plane), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema), plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-4:02", [ Hotel:0-Working (Cinema) -3934:3956, plane:2-Working (Hotel), wineshop:6-Working (Plane) ], "2021-06-0514:02:15", [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema);
step2.28, p++, step2.20 is performed;
step2.29, list servicel additive serviceuser+ "-" +servicealld+ "(" +servicepro+ ") time is assigned to the current time, servicel additive is added to list servicel additive list, and Step3 is executed. The method comprises the steps of carrying out a first treatment on the surface of the
As can be seen from Step2, each time a service call data is monitored in real time, so that the service interaction diagram is changed, the Step2 is terminated, the subsequent flow is executed, and in the embodiment described in Step3, the subsequent flow is described by taking the result after the service interaction diagram is changed as an example when the time is "2021-06-0514:02:20".
The Step3 is specifically as follows:
step3.1, initializing empty list ColumnList for storing monitoring data list index, initializing g=0, v=0, executing step3.2
Step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
time=“2021-06-0514:02:20”;
ServiceInteractiveList=[“2021-06-0514:02:07”,[Hotel:0-Working(Cinema)1-Working(Wineshop),Plane:2-Working(Hotel)],“2021-06-0514:02:11”,[Hotel:0-Working(Cinema)1-Working(Wineshop)4-Working(Plane),Plane:2-Working(Hotel)],“2021-06-0514:02:13”,[Hotel:0-Working(Cinema)1-Working(Wineshop)4-Working(Plane),Plane:2-Working(Hotel),Wineshop:6-Working(Plane)],“2021-06-0514:02:15”,[Hotel:0-End(Cinema)1-Working(Wineshop)4-Working(Plane)7-End(Cinema),Plane:2-Working(Hotel),Wineshop:6-Working(Plane)]];
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
ServiceInteractive=[Hotel:0-End(Cinema)1-Working(Wineshop)4-Working(Plane)7-End(Cinema),Plane:2-Working(Hotel),Wineshop:6-End(Plane)];
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
ServiceList=[Hotel,Wineshop,Plane];
step3.5, judging whether v is smaller than the ServiceList length, if so, executing step3.6, otherwise, ending execution;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7;
(hereinafter described primarily with the operational data serving "Hotel");
table 3 monitoring index attribute table of service operation data
time Service operation data acquisition time
%usr Percentage of cpu occupied by service processes in user space
%system Percentage of CPU occupied by service processes in kernel space
%guest Percentage of CPU occupied by service process in virtual machine
%mem Percentage of memory occupied by a service process
kB_rd/s KB that service processes read from disk every second
kB_wr/s Service processes write to disk KB every second
Table 4Hotel service run data table
time %usr %system %guest %mem kB_rd/s kB_wr/s
2021-06-0514:02:00 0.61 0.55 0.7 0.38 0.54 0.31
2021-06-0514:02:01 0.82 0.41 0.35 0.34 1.29 0.09
2021-06-0514:02:02 0.56 0.73 0.15 0.43 1.74 0.89
2021-06-0514:02:03 0.68 0.99 0.13 0.37 0.01 0.30
2021-06-0514:02:04 0.47 0.35 0.53 0.41 0.77 0.54
2021-06-0514:02:05 0.35 0.61 0.13 0.49 0.64 0.23
2021-06-0514:02:06 0.84 0.57 0.09 0.50 0.12 0.03
2021-06-0514:02:07 0.59 0.88 0.23 0.46 0.40 0.01
2021-06-0514:02:08 0.92 0.51 0.43 0.45 0.45 0.47
2021-06-0514:02:09 0.10 0.39 0.13 0.39 0.46 0.31
2021-06-0514:02:10 0.57 0.32 0.22 0.32 0.54 1.38
2021-06-0514:02:11 0.15 0.92 0.50 0.32 0.14 0.09
2021-06-0514:02:12 0.73 0.99 0.75 0.33 1.75 0.24
2021-06-0514:02:13 0.6 0.05 0.65 0.32 0.24 0.78
2021-06-0514:02:14 0.2 0.81 0.11 0.31 0.23 1.37
2021-06-0514:02:15 0.71 0.16 0.24 0.32 0.57 0.89
2021-06-0514:02:16 0.44 0.51 0.50 0.37 0.24 1.15
2021-06-0514:02:17 0.8 0.84 0.53 0.42 0.98 1.24
2021-06-0514:02:18 0.5 0.07 0.41 0.22 0.08 0.52
2021-06-0514:02:19 0.54 0.7 0.7 0.29 0.14 0.18
2021-06-0514:02:20 0.92 0.51 0.45 0.34 0.0 0.0
Step3.7, reading an environment data set B of the service pointed by the service name with the index v, and executing step3.8;
(hereinafter, mainly described in terms of Cpu environment data serving "Hotel", v=0);
table 5 service environment data monitoring index attribute table
TABLE 6 service Hotel Environment data Table
time cpu_user cpu_sys cpu_free
2021-06-0514:02:00 6 8 86
2021-06-0514:02:01 4 14 82
2021-06-0514:02:02 5 11 84
2021-06-0514:02:03 2 13 84
2021-06-0514:02:04 3 11 86
2021-06-0514:02:05 3 4 93
2021-06-0514:02:06 6 5 92
2021-06-0514:02:07 4 11 85
2021-06-0514:02:08 3 11 86
2021-06-0514:02:09 3 10 87
2021-06-0514:02:10 19 10 71
2021-06-0514:02:11 5 13 82
2021-06-0514:02:12 4 9 87
2021-06-0514:02:13 5 4 91
2021-06-0514:02:14 8 6 86
2021-06-0514:02:15 9 14 77
2021-06-0514:02:16 3 13 84
2021-06-0514:02:17 5 4 91
2021-06-0514:02:18 9 14 77
2021-06-0514:02:19 6 11 83
2021-06-0514:02:20 70 5 30
Step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
normdata= [ [0.61,0.55,0.7,0.38,0.54,0.31,6.0,8.0,86.0], [0.82,0.41,0.35,0.34,1.29,0.09,4.0,14.0,82.0], [0.56,0.73,0.15,0.43,1.74,0.89,5.0,11.0,84.0], [0.68,0.99,0.13,0.37,0.01,0.30,3.0,13.0,84.0], [0.47,0.35,0.53,0.41,0.77,0.54,3.0,11.0,86.0], [0.35,0.61,0.13,0.49,0.64,0.23,3.0,4.0,93.0], [0.84,0.57,0.09,0.50,0.12,0.03,3.0,5.0,92.0], [0.59,0.88,0.23,0.46,0.40,0.01,4.0,11.0,85.0], [0.92,0.51,0.43,0.45,0.45,0.47,3.0,11.0,86.0], [0.1,0.39,0.13,0.39,0.46,0.31,3.0,10.0,87.0], [0.57,0.32,0.22,0.32,0.54,1.38,19.0,10.0,71.0], [0.15,0.92,0.5,0.32,0.14,0.09,5.0,13.0,82.0], [0.73,0.99,0.75,0.33,1.75,0.57,4.0,9.0,87.0], [0.6,0.05,0.65,0.32,0.24,0.78,5.0,4.0,91.0], [0.2,0.81,0.11,0.31,0.23,1.37,8.0,6.0,86.0], [0.71,0.16,0.24,0.32,0.57,0.89,9.0,14.0,77.0], [0.44,0.51,0.5,0.37,0.24,1.15,3.0,13.0,84.0], [0.8,0.84,0.53,0.42,0.98,1.24,5.0,4.0,91.0], [0.5,0.07,0.41,0.22,0.08,0.52,9.0,14.0,77.0], [0.54,0.7,0.7,0.29,0.14,0.18,6.0,11.0,83.0] ], perform step3.9;
Step3.9, using the data set A, B to obtain the running data and environment data of the service at the time, which are denoted by the service name with the index v, storing the running data and environment data into a list abnorm data, and executing step3.10;
abnorm data= [ [0.92,0.51,0.45,0.22,0.0,0.0,70.0,5.0,30.0] ], step3.10 is performed;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict, which is used for storing the results of ADF checking the stability of NormData column g, including information such as probability value, confidence level, and the like, and executing step3.12; wherein the ADF test is to determine if the sequence has a root of unity: if the sequence is stable, no unit root exists; otherwise, a unit root exists;
when g=0, pdict= (-4.075336199834505,0.0010635440168466083,8,11, { ' 1%: 4.223238279489106, ' 5%: -3.189368925619835, ' 10%: -2.729839421487603}, -19.87455866559617);
when g=1, pdict= (33.589052234909,1.0,8,11, { '1%': -4.223238279489106, '5%': -3.189368925619835, '10%': -2.729839421487603}, -138.91275540402972);
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
When g=0, PDict [1] <0.05& & PDict [0] < PDict [4] [ 5%' ] returns True, i.e. since 0.001063554401466083 <0.05 and-4.075336199834505 < -3.189368925619835, step3.13 is performed;
when g=1, due to 1-! <0.05 and 33.589052234909-! < -3.189368925619835, execute step3.14;
step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
when g=8, columnlist= [0,2,4,5,6,7,8], step3.14 is performed;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
NormData=[[0.61,0.7,0.54,0.31,6.0,8.0,86.0],[0.82,0.35,1.29,0.09,4.0,14.0,82.0],[0.56,0.15,1.74,0.89,5.0,11.0,84.0],[0.68,0.13,0.01,0.3,3.0,13.0,84.0],[0.47,0.53,0.77,0.54,3.0,11.0,86.0],[0.35,0.13,0.64,0.23,3.0,4.0,93.0],[0.84,0.09,0.12,0.03,3.0,5.0,92.0],[0.59,0.23,0.4,0.01,4.0,11.0,85.0],[0.92,0.43,0.45,0.47,3.0,11.0,86.0],[0.1,0.13,0.46,0.31,3.0,10.0,87.0],[0.57,0.22,0.54,1.38,19.0,10.0,71.0],[0.15,0.5,0.14,0.09,5.0,13.0,82.0],[0.73,0.75,1.75,0.57,4.0,9.0,87.0],[0.6,0.65,0.24,0.78,5.0,4.0,91.0],[0.2,0.11,0.23,1.37,8.0,6.0,86.0],[0.71,0.24,0.57,0.89,9.0,14.0,77.0],[0.44,0.5,0.24,1.15,3.0,13.0,84.0],[0.8,0.53,0.98,1.24,5.0,4.0,91.0],[0.5,0.41,0.08,0.52,9.0,14.0,77.0],[0.54,0.7,0.14,0.18,6.0,11.0,83.0]];
AbnormData=[[0.92,0.45,0.0,0.0,70.0,5.0,30.0]]
step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17; rank=7;
step3.17, dimension-reducing NormData into rank dimensions using PCA algorithm. Wherein the PCA algorithm is a dimensionality reduction algorithm that projects data into a new linear space by linear transformation, typically used to reduce the dimensionality of the data set, performing step3.18;
NormData=[[-1.51915662e+00,1.67839776e+00,-9.20561037e-02,-3.38842790e-01,-2.64695288e-01,-1.94106294e-01,2.02951168e-15],[3.09969325e+00,-4.21268869e+00,6.41895208e-01,-4.63350517e-01,1.04619896e-02,1.35936248e-01,1.97317363e-15],[7.99512232e-01,-1.22276523e+00,1.20472136e+00,7.61755242e-02,3.38544626e-01,7.17087567e-02,2.19326982e-15],[6.65381589e-01,-4.07184012e+00,-5.48551153e-01,1.35649016e-01,3.94163354e-02,3.03229694e-01,2.52166725e-15],[-1.71497174e+00,-2.53390341e+00,2.17091346e-01,1.34425782e-01,-3.69823257e-02,-1.61444424e-01,2.71151084e-15],[-1.00483507e+01,2.79331143e+00,-1.13096013e-01,-2.90500359e-01,3.62564786e-01,-4.41687077e-02,2.23873657e-15],[-8.85980366e+00,2.01835423e+00,-5.88929521e-01,-3.96373715e-01,2.86496876e-02,3.91458784e-01,1.89719011e-15],[-4.65831542e-01,-1.92124920e+00,-2.95541821e-01,-3.52152210e-01,1.07258662e-01,9.57465920e-02,2.04324234e-15],[-1.71486944e+00,-2.53711857e+00,-5.29178806e-02,8.06016362e-02,-2.75872203e-01,2.70897174e-01,1.89094551e-15],[-2.90592677e+00,-1.78180914e+00,-2.29068506e-01,3.92806065e-02,4.65706687e-01,-1.94198925e-01,1.91513725e-15],[1.72148406e+01,8.61868906e+00,8.86409244e-02,-1.29882178e-01,5.61703665e-02,6.53787970e-02,2.15085305e-15],[3.17340122e+00,-2.79663097e+00,-5.45576843e-01,-1.57469659e-01,1.08848189e-01,-3.83597058e-01,1.89284501e-15],[-2.84439695e+00,-3.65879298e-01,1.17415795e+00,-2.62013060e-01,-2.01017673e-01,-1.97807974e-01,2.12283801e-15],[-7.52459778e+00,4.12149037e+00,-2.84234464e-01,1.44557945e-01,-3.00385928e-01,-1.06971844e-01,2.82677042e-15],[-1.36268013e+00,4.57424522e+00,-2.20091668e-01,6.38934580e-01,3.10575735e-01,-4.54315668e-02,1.92748756e-15],[9.40110648e+00,-9.25540015e-01,1.22701671e-01,1.63800898e-01,-1.56101956e-02,2.12645340e-01,2.28739167e-15],[6.79417525e-01,-4.01643378e+00,-9.75945892e-02,9.11857218e-01,-1.35100285e-01,-5.77634972e-02,1.87225485e-15],[-7.52027715e+00,4.14895123e+00,5.64961382e-01,3.60955151e-01,-2.41847803e-01,9.65943256e-02,1.58271040e-15],[9.39692754e+00,-9.48477781e-01,-4.62117635e-01,-3.15259612e-02,-7.31925704e-02,-4.42096053e-02,2.08775882e-15],[2.05058200e+00,-6.19103091e-01,-4.84393648e-01,-2.64127908e-01,-2.83492792e-01,-2.13895816e-01,1.78453432e-15]];
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
AbnormData=[[71.23996651,45.46498957,-1.21745986,-4.89679241,-0.25414131,-0.31979014,-2.88675135]];
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
u=[2.26485497e-15,7.27196081e-16,-9.15933995e-17,-7.49400542e-17,-3.88578059e-17,4.44089210e-17,2.09749146e-15];
Step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
invCov=[[2.32302503e-02,-4.38464744e-18,5.56724225e-18,-1.51830280e-16,5.58517535e-17,6.47921316e-17,9.36298368e-01],[-4.38464744e-18,8.41187742e-02,-1.33227819e-16,-6.37438566e-17,1.43954637e-16,-1.27559453e-16,7.82404275e-01],[5.56724225e-18,-1.33227819e-16,3.56641403e+00,1.47796868e-15,-4.04874223e-16,5.96567255e-16,-9.87258228e+00],[-1.51830280e-16,-6.37438566e-17,1.47796868e-15,8.10529429e+00,6.33868403e-16,5.36172840e-15,-1.79497241e+01],[5.58517535e-17,1.43954637e-16,-4.04874223e-16,6.33868403e-16,1.84594147e+01,9.22921793e-15,-1.89707015e+01],[6.47921316e-17,-1.27559453e-16,5.96567255e-16,5.36172840e-15,9.22921793e-15,2.48556248e+01,1.44491727e+01],[9.36298368e-01,7.82404275e-01,-9.87258228e+00,-1.79497241e+01,1.89707015e+01,1.44491727e+01,1.06677334e+31]];
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
diff=[[71.23996651,45.46498957,-1.21745986,-4.89679241,-0.25414131,-0.31979014,-2.88675135]—[2.26485497e-15,7.27196081e-16,-9.15933995e-17,-7.49400542e-17,-3.88578059e-17,4.44089210e-17,2.09749146e-15];
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, service failure probability value t=diff invCov diffT, execute step3.24; t=8.8897778e+31;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
since only the edge of Plane 2-Working (Hotel) points to Hotel [ Hotel:0-Working (Cinema) 1-Working (Wineshop), plane 2-Working (Hotel) ], hotel has an entry degree of 1 in this figure. By the same token, IList= [1, 1]
Step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed; EI = 1; si=0;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
The ingress in=1, since only the edge of Plane 2-Working (Hotel) points to Hotel IN [ Hotel:0-End (Cinema) 1-Working (Wineshop) -Working (Plane) -End (Cinema), plane 2-Working (Hotel), wineshop:6-Working (Plane) ] ];
step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed; f=1;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList, wherein pi is the circumference ratio, exp (-x/2) is-x/2 th power of natural logarithmic base, +inf is positive infinity, int () is an integral operation function, and int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result that x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) is positive infinite integral from the lower integral limit T/f to the upper integral limit, and step3.29 is executed;
k=1; the value range of the fault probability is 0-1;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
In the above steps, ADF inspection is used to screen the steady monitoring index during normal operation of the service before PCA dimension reduction is used. The precondition that the monitoring index deviates from the normal running condition to calculate the fault probability is that the index has a relatively stable mean value as a reference value in the normal running process of the service, so that ADF (automatic frequency filter) inspection is carried out to screen the stable monitoring index in the normal running process of the service, and the stable monitoring index in the normal running process of the screening service can effectively represent the stability of the index, thereby improving the accuracy of the calculation result. The complete collinearity of the monitoring index causes inaccurate calculation results, such as CPU occupancy rate and CPU residual rate, the covariance matrix determinant may be smaller than 0, the covariance matrix is ensured to be a semi-positive definite matrix by reducing the data into rank dimensions through a PCA algorithm, the two steps eliminate the complete collinearity of the monitoring index, and meanwhile, the calculation efficiency can be improved. And furthermore, the repeated data characteristics among the monitoring indexes can be effectively eliminated by matching with an inverse matrix of the calculated covariance matrix, and the calculated result is more accurate through optimization. By adding the service interaction graph incidence change, the incidence change is used as the user request quantity change to correct the abnormal deviation of the monitoring index, so that the service fault probability is calculated more comprehensively.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (9)

1. A cloud environment service fault probability calculation method based on a service interaction diagram is characterized by comprising the following steps of: comprising the following steps:
step1, extracting service running state call data in a target data set;
step2, a service interaction diagram is built according to service running state calling data, and a change record of the service interaction diagram is obtained according to the current time;
step3, calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram;
the service interaction diagram is built according to the service running state calling data, and a change record of the service interaction diagram is obtained according to the current time, and the method specifically comprises the following steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
If the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
Step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
Step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
Step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
2. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: the extracting service running state call data in the target data set includes: reading a target data set monitored in a cloud environment, and extracting service call data from the target data set; and screening the service call data to obtain the service call data in the current running state, wherein the service call data is used as the service running state call data.
3. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: the step of establishing a service interaction diagram according to the service running state calling data and obtaining a change record of the service interaction diagram according to the current time comprises the following steps: acquiring service user names, service call numbers, service call states and service names according to service running state call data, establishing a service interaction diagram represented by an adjacency list, and monitoring new service call data in real time for updating the service interaction diagram; and recording the current time and acquiring a change record of the service interaction diagram.
4. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: according to the service interaction diagram and the change record of the service interaction diagram, calculating the fault probability of the service comprises the following steps: traversing the service names in the running state in the service interaction diagram, and sequentially reading the running data and the environment data of the service corresponding to the service names; screening out stable monitoring indexes according to the operation data and the environment data of the service before the reading time, screening out the operation data and the environment data of the service before the reading time and the operation data and the environment data of the service before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from the normal; and correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
5. The cloud environment service failure probability calculation method based on the service interaction diagram according to claim 1, wherein: according to the service interaction diagram and the change record of the service interaction diagram, calculating the probability of service failure, wherein the method comprises the following specific steps:
step3.1, initializing an empty list ColumnList for storing a monitoring data list index, initializing temporary variables g=0 and v=0, and executing step3.2;
step3.2, reading the serviceitectivelist at a time before the time, and executing step3.3;
step3.3, reading a service interaction diagram serviceinformation with time, and executing step3.4;
step3.4, traversing the ServiceList, storing the running service name as a list ServiceList, and executing step3.5;
step3.5, judging whether v is smaller than the ServiceList length, executing step3.6 if the condition is met, and ending if the condition is not met;
step3.6, reading an operation data set A of the service pointed by the service name with the index v in the list ServiceList, and executing step3.7; wherein, the columns in the operation data set A of the service are the monitoring index categories;
step3.7, reading an environment data set B of the service pointed by the service name with the index v in the read list ServiceList, and executing step3.8; wherein, the columns in the environment data set B of the service are the monitoring index categories;
Step3.8, using the data set A, B to obtain running data and environment data of the service before the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and the environment data into the list NormData, and executing step3.9;
step3.9, using the data set A, B to obtain the running data and environment data of the service in the time, which are denoted by the service name with the index v in the list ServiceList, storing the running data and environment data into the list abnorm data, and executing step3.10;
step3.10, judging whether g is smaller than the NormData column number of the list, if the condition is met, executing step3.11, otherwise executing step3.15;
step3.11, an initialization list PDict for storing ADF to check the stability of NormData column g, and executing step3.12;
step3.12, judging that PDict [1] <0.05& & PDict [0] < PDict [4] [ 5% ] if the condition is satisfied, executing step3.13, otherwise executing step3.14, wherein PDict [1] is the probability value of ADF test, PDict [0] is the statistic of ADF test, PDict [4] [ 5% ] is the threshold of statistic under 5% confidence level;
step3.13, storing the NormData column g index into a ColumnList, and executing step3.14;
step3.14, g++; step3.10 is performed;
step3.15, screening the NormData data according to the index in the ColumnList, assigning the result as NormData, screening the Abnorm Data data according to the index in the ColumnList, assigning the result as Abnorm Data, and executing step3.16;
Step3.16, calculating the rank of NormData, assigning a rank to the result, and executing step3.17;
step3.17, using PCA algorithm to reduce NormData into rank dimensions; step3.18 is performed;
step3.18, mapping abnorm data into rank dimensions using PCA algorithm, performing step3.19;
step3.19, calculating the average value of each column of NormData, storing the average value as a list u, and executing step3.20;
step3.20, calculating an inverse matrix invCov of the NormData covariance matrix, and executing step3.21;
step3.21, monitor index mean difference row vector diff=abnorm data-u, execute step3.22;
step3.22, calculating a transposed column vector diffT of the row vector diff, and executing step3.23;
step3.23, degree of service deviation t=diff x invCov x diffT, execute step3.24;
step3.24, calculating the degree of entry of the service indicated by the service name with the ServiceList index v in each interaction diagram of the list ServiceList, storing the result as a list IList, and executing step3.25;
step3.25, wherein an initialization variable EI is used for storing the average service penetration calculated by IList, an initialization variable SI is used for storing the service penetration variance calculated by IList, and step3.26 is executed;
step3.26, wherein the initialization variable IN is used for storing the degree of the service IN the interaction diagram ServiceList pointed by the service name with the ServiceList index v, and executing step3.27;
Step3.27, a fault probability correction parameter f=1+abs ((IN-EI)/(si+1)) of the service indicated by the service name with the index v IN the list ServiceList, wherein abs is an absolute value operation, and step3.28 is executed;
step3.28, the failure probability k=1-int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) of the service indicated by the service name with index v in the list ServiceList; where pi is the circumference ratio, exp (-x/2) is the natural logarithm to the power-x/2, +inf is positive infinity, int () is an integral operation function, int (exp (-x/2)/(2 x pi)/(1/2), x, T/f, +inf) is the result of positive infinite integration of the argument x in the function f (x) =exp (-x/2)/(2 x pi)/(1/2) from the lower integral limit T/f to the upper integral limit, and step3.29 is performed;
step3.29, output k, execute step3.30;
step3.30, v++, step3.5 was performed.
6. A cloud environment service fault probability computing system based on a service interaction diagram is characterized in that: comprising the following steps:
an extraction module for executing Step1: extracting service running state calling data in a target data set;
an acquisition module for executing Step2: establishing a service interaction diagram according to service running state calling data, and acquiring a change record of the service interaction diagram according to the current time;
A calculation module for executing Step3: calculating the fault probability of the service according to the service interaction diagram and the change record of the service interaction diagram;
the service interaction diagram is built according to the service running state calling data, and a change record of the service interaction diagram is obtained according to the current time, and the method specifically comprises the following steps:
step2.1, reading a service running state call data list TransferList, initializing an empty list servicelist for storing a service interaction diagram, initializing a temporary variable j=0, initializing a time variable time for storing the current time, initializing the empty list servicelist for storing a change record of the service interaction diagram, and executing step2.2;
if the length of the step2.2 and j is smaller than the length of the list TransferList, executing the step2.3 if the condition is satisfied, otherwise, assigning the time to be the current time, adding the time and the serviceiiterfective into the list serviceiiterfectivelist, and executing the step2.11;
step2.3, recording j data as an index by using a list transferList, obtaining a service user name, storing the service name into an initialization variable Seruser, obtaining a service name, storing the service name into an initialization variable SerPro, obtaining a service call state, storing the service call state into the initialization variable SerStatus, obtaining a service call number, storing the service call number into an initialization variable CallId, initializing a temporary variable h=0, and executing step2.4;
Step2.4, judging whether h is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.5, otherwise, executing step2.10;
step2.5, initializing a String type variable ServiceUser1, for storing a service user name obtained by using the data with the index h recorded by the list ServiceInterfective, and executing step2.6;
step2.6, judging whether Serviceuser1 is equal to Seruser, if the condition is met, executing step2.7, otherwise executing step2.9;
step2.7, replacing the data with the index h recorded by the list serviceinformation with the data with the index h recorded by the list serviceinformation+callid+ "-" +serstatus+ "(" serpro+ ")", and executing step2.8;
step2.8, j++, step2.2 is performed;
step2.9, h++, step2.4 is performed;
step2.10, list ServiceInteractive adds Seruser+ "" CallId+ "-" SerStatus+ "(" SerPro+ ")", execute Step2.8;
step2.11, monitoring new service call data in real time, obtaining a service user name, storing the service user name into an initialization variable ServiceUser, obtaining a service name, storing the service name into an initialization variable ServiceStatus, obtaining a service call state, storing the service call state into an initialization variable ServiceStatus, obtaining a call number, storing the call number into an initialization variable serviceall, initializing a temporary variable p=0, and executing step2.12;
Step2.12, reading a deposit service interaction diagram list serviceinformation, and executing step2.13;
whether the Step2.13 and the ServiceStatus are equal to 'Working', if the conditions are satisfied, executing the step2.14, otherwise executing the step2.20;
whether the Step2.14, p is smaller than the length of the list ServiceInterfective, if the condition is satisfied, executing the step2.15, otherwise executing the step2.19;
step2.15, initializing a String type variable Serveruser, storing a service user name obtained by using data with index p recorded by a list ServiceInterfective, and executing step2.16;
step2.16, judging whether the Serveruser is equal to the Serviceuser, if the condition is satisfied, executing step2.17, otherwise, executing step2.18;
step2.17, replacing the data with the index p recorded by the list serviceinformation with the data with the index p recorded by the list serviceinformation +servicedetails + "-" servicedetails + ", assigning the time as the current time, adding the time and the serviceinformation into the list serviceinformation list, and executing Step3;
step2.18, p++, step2.14 is performed;
step2.19, adding ServiceElnteractive to ServiceUser+ ":" +ServiceCallId+ "(" +ServicePro+ ")", assigning time to the current time, adding time and ServiceElteractive to the list ServiceElteracthlist, and executing Step3;
Step2.20, judging whether p is smaller than the length of the list ServiceInterfective, if the condition is met, executing step2.21, otherwise, executing step2.29;
step2.21, storing a service call number obtained by using data with an index p recorded by serviceindex, initializing a String type variable server, storing a service user name obtained by using data with an index p recorded by serviceindex, initializing a temporary variable q=0, and executing step2.22;
step2.22, judging whether the server is equal to the ServiceUser, if the condition is met, executing step2.23, otherwise, executing step2.28;
step2.23, judging whether q is smaller than the length of the list ServerCallId, if the condition is satisfied, executing step2.24, otherwise, executing step2.27;
step2.24, if the data with index q recorded by list servercall is equal to serviceall, if the condition is satisfied, executing step2.25, otherwise executing step2.26;
step2.25, using the current service call data obtained by ServiceUser, servicePro, serviceStatus, serviceCallId to update servicel technical, executing Step3;
step2.26, q++, step2.23 is performed;
step2.27, replacing data with the index p recorded by the list servicelets with data with the index p recorded by the list servicelets +servicelets +"(" +servicelets +") by the list servicelets, assigning time to be the current time, adding the time and the servicelets to the list servicelets list, and executing Step3;
Step2.28, p++, step2.20 is performed;
step2.29, list servicel-alternative adds serviceuser+ "-" +serviceal id+ "," +servicestatus+ "(" +servicepro+ ") to the list servicel-alternative, assigns the time to the current time, adds the time and servicel-alternative to the list servicel-alternative list, and executes Step3.
7. The service interaction graph-based cloud environment service failure probability calculation system of claim 6, wherein: the computing module is used for traversing the service names in the running state in the service interaction diagram and sequentially reading the running data and the environment data of the service corresponding to the service names; the method comprises the steps of screening stable monitoring indexes according to service operation data and environment data before the reading time, screening service operation data and environment data before the reading time and service operation data and environment data before the reading time according to the stable monitoring indexes, and calculating the deviation degree of the service from normal; the method is used for correcting the deviation degree of the service from the normal degree by using the change record of the service interaction diagram and the incoming degree change of the service in the service interaction diagram, and obtaining the fault probability of the service.
8. A terminal comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, characterized by: the processor is configured to perform the method of any of claims 1-5.
9. A computer-readable storage medium including a stored program, characterized in that: the program, when executed by a processor, causes the processor to implement the method of any one of claims 1-5.
CN202111300037.2A 2021-11-04 2021-11-04 Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram Active CN114124738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111300037.2A CN114124738B (en) 2021-11-04 2021-11-04 Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111300037.2A CN114124738B (en) 2021-11-04 2021-11-04 Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram

Publications (2)

Publication Number Publication Date
CN114124738A CN114124738A (en) 2022-03-01
CN114124738B true CN114124738B (en) 2024-03-19

Family

ID=80380518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111300037.2A Active CN114124738B (en) 2021-11-04 2021-11-04 Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram

Country Status (1)

Country Link
CN (1) CN114124738B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572856A (en) * 2014-12-17 2015-04-29 武汉科技大学 Converged storage method of service source data
CN111737033A (en) * 2020-05-26 2020-10-02 复旦大学 Micro-service fault positioning method based on runtime map analysis
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN113032238A (en) * 2021-05-25 2021-06-25 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572856A (en) * 2014-12-17 2015-04-29 武汉科技大学 Converged storage method of service source data
CN111737033A (en) * 2020-05-26 2020-10-02 复旦大学 Micro-service fault positioning method based on runtime map analysis
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN113032238A (en) * 2021-05-25 2021-06-25 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph

Also Published As

Publication number Publication date
CN114124738A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
KR102214297B1 (en) Conditional validation rules
US9501504B2 (en) Automatic detection of potential data quality problems
AU2016328959A1 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modeling systems
EP3418910A1 (en) Big data-based method and device for calculating relationship between development objects
CN107168995B (en) Data processing method and server
US9940215B2 (en) Automatic correlation accelerator
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN112685324B (en) Method and system for generating test scheme
US11379466B2 (en) Data accuracy using natural language processing
AU2017279795A1 (en) Metadata-driven program code generation for clinical data analysis
CN116126843A (en) Data quality evaluation method and device, electronic equipment and storage medium
CN114124738B (en) Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram
Zhang et al. On the cost of interactions in interactive visual machine learning
CN114781517A (en) Risk identification method and device and terminal equipment
US20210397538A1 (en) Diagnosing application problems by learning from fault injections
CN114443493A (en) Test case generation method and device, electronic equipment and storage medium
WO2013173422A1 (en) Method and system for collapsing functional similarities and consolidating functionally similar, interacting systems
CN113656267B (en) Device energy efficiency calculation method and device, electronic device and storage medium
CN114581693B (en) User behavior mode distinguishing method and device
CN111752984B (en) Information processing method, device and storage medium
CN114416504A (en) Performance boundary bottleneck simulation deduction method and system for cloud computing system
CN116010792A (en) Method and device for testing robustness of model
CN118760590A (en) Code change tracking method and device
CN112115124A (en) Data influence degree analysis method and device, electronic equipment and storage medium
CN114090314A (en) Service fault propagation path judgment method and device under cloud computing environment and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant