CN117194091A

CN117194091A - Abnormal service detection method and device, storage medium and electronic equipment

Info

Publication number: CN117194091A
Application number: CN202311199532.8A
Authority: CN
Inventors: 张梦迪; 贾玉红; 陆怡; 徐聿帆
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-08

Abstract

The application discloses a detection method and device for abnormal services, a storage medium and electronic equipment, and relates to the technical field of artificial intelligence, the financial science and technology field or other related fields. The method comprises the following steps: m first services in a target system are determined, and a target data set of each first service is acquired; inputting the target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with abnormality in a target system; acquiring a knowledge graph; and determining target services in the S first services according to the S first services and the knowledge graph. The application solves the problem of poor effect of troubleshooting the abnormal service with the faults caused by troubleshooting the abnormal service with the faults in the system by manual mode in the related art.

Description

Abnormal service detection method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence, financial technology, or other related fields, and in particular, to a method and apparatus for detecting abnormal services, a storage medium, and an electronic device.

Background

In the related art, abnormal services with faults in a system are generally checked manually. Moreover, in recent years, with rapid development and wide application of technologies such as cloud computing, big data, artificial intelligence, etc., various enterprises and organizations have increasingly demanded reliability and high efficiency of IT (Information Technology ) infrastructure. Therefore, the related art cannot meet these requirements based on the manual operation and maintenance mode, that is, the related art needs to rely on the knowledge and experience of operators when the abnormal service of the fault in the system is detected, and has the problems of low efficiency, error-prone and the like.

Aiming at the problem that in the related art, the abnormal service with the fault in the system is checked manually, so that the effect of checking the abnormal service with the fault is poor, no effective solution is proposed at present.

Disclosure of Invention

The application mainly aims to provide a detection method and device for abnormal services, a storage medium and electronic equipment, and aims to solve the problem that in the related art, the abnormal services with faults in a system are detected manually, so that the effect of detecting the abnormal services with faults is poor.

In order to achieve the above object, according to one aspect of the present application, there is provided a method of detecting an abnormal service. The method comprises the following steps: determining M first services in a target system, and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1; inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1; acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and a calling relationship between every two first services in the M first services; and determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer.

Further, determining the target service in the S first services according to the S first services and the knowledge graph includes: performing traversal processing on each first service in the S first services according to the knowledge graph, and determining whether the service called by each first service in the S first services is the second service or not to obtain S traversal results; and if the S traversal results indicate that the service calling the second service exists in the S first services, the service calling the second service in the S first services is used as the target service.

Further, inputting the target data set of each first service into a target prediction model for prediction processing, and obtaining a prediction result comprises: preprocessing data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment; and clustering the preprocessed data by combining the unsupervised learning algorithm to obtain the prediction result, wherein the prediction result at least comprises the S first services with the abnormality in the target system.

Further, if the preprocessing is aggregation processing, preprocessing the data in the target data set of each first service to obtain preprocessed data, where the preprocessing includes: acquiring preset time period information; and carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain the preprocessed data.

Further, the target prediction model is obtained by: acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of the target system acquired in a history process, and Q is a positive integer greater than 1; and performing unsupervised training on the original prediction model by using the sample training set based on the unsupervised learning algorithm to obtain the target prediction model.

Further, obtaining the knowledge graph includes: acquiring attribute information of the M first services; determining a calling relationship between every two first services in the M first services; and acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

Further, if the prediction result further includes anomaly probability and anomaly performance index data corresponding to each of the S first services, after determining the target service in the S first services according to the S first services and the knowledge graph, the method further includes: acquiring attribute information of the target service; summarizing attribute information of the target service, abnormal probability corresponding to each first service in the S first services and abnormal performance index data to obtain a data information set, and sending the data information set to a target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

In order to achieve the above object, according to another aspect of the present application, there is provided a detection apparatus of abnormal services. The device comprises: the first processing unit is used for determining M first services in a target system and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1; the first input unit is used for inputting the target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1; the first acquisition unit is used for acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and calling relations between every two first services in the M first services; the first determining unit is configured to determine a target service in the S first services according to the S first services and the knowledge graph, where the target service is a service that invokes T second services, the second services are services other than the S first services in the M first services, and T is a positive integer.

Further, the first determination unit includes: the first processing module is used for performing traversal processing on each first service in the S first services according to the knowledge graph, determining whether the service called by each first service in the S first services is the second service or not, and obtaining S traversal results; and the first determining module is used for taking the service which calls the second service in the S first services as the target service if the S traversal results indicate that the service which calls the second service exists in the S first services.

Further, the first input unit includes: the second processing module is used for preprocessing the data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment; and the third processing module is used for carrying out clustering processing on the preprocessed data by combining with the unsupervised learning algorithm to obtain the prediction result, wherein the prediction result at least comprises the S first services with the abnormality in the target system.

Further, if the preprocessing is aggregation processing, the second processing module includes: the first acquisition submodule is used for acquiring preset time period information; and the first processing sub-module is used for carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain the preprocessed data.

Further, the target prediction model is obtained by: the second acquisition unit is used for acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of the target system acquired in a history process, and Q is a positive integer greater than 1; and the first training unit is used for performing unsupervised training on the original prediction model by using the sample training set based on the unsupervised learning algorithm to obtain the target prediction model.

Further, the first acquisition unit includes: the first acquisition module is used for acquiring attribute information of the M first services; a second determining module, configured to determine a calling relationship between every two first services in the M first services; and the second acquisition module is used for acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

Further, if the prediction result further includes abnormal probability and abnormal performance index data corresponding to each of the S first services, the apparatus further includes: a third obtaining unit, configured to obtain attribute information of a target service in the S first services after determining the target service according to the S first services and the knowledge graph; the second processing unit is used for summarizing the attribute information of the target service, the abnormal probability corresponding to each first service in the S first services and the abnormal performance index data to obtain a data information set, and sending the data information set to a target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

In order to achieve the above object, according to another aspect of the present application, there is provided a computer-readable storage medium storing a program, wherein the program performs the method of detecting an abnormal service as set forth in any one of the above.

In order to achieve the above object, according to another aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for detecting an abnormal service as set forth in any one of the above.

According to the application, the following steps are adopted: determining M first services in a target system and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first service, and M and N are positive integers greater than 1; inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the prediction result at least comprises S first services with anomalies in a target system, the target prediction model is a model constructed based on an unsupervised learning algorithm, and S is a positive integer greater than 1; acquiring a knowledge graph, wherein the knowledge graph at least comprises information of M first services and calling relations between every two first services in the M first services; determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer, so that the problem that in the related art, the effect of troubleshooting abnormal services with faults in a system is poor due to the fact that the abnormal services with faults are troubleshooted in a manual mode is solved. The method comprises the steps that the performance data set of each service in a system to be detected is input into a prediction model to be subjected to prediction processing, a plurality of services with faults in the system to be detected are predicted, a knowledge graph is obtained, the knowledge graph comprises information of the plurality of services in the system to be detected and a calling relation between every two services in the plurality of services, then the source abnormal service with the faults in the system is determined according to the plurality of services with the faults in the system and the knowledge graph, the abnormal service with the faults in the system is not required to be manually checked, and therefore efficiency and accuracy of the abnormal service with the faults in the checking system are improved, and further effects of the abnormal service with the faults in the checking system are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a flow chart of a method for detecting an abnormal service provided according to an embodiment of the present application;

FIG. 2 is a flowchart of an intelligent service fault location method based on an unsupervised algorithm and a knowledge graph according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an unsupervised algorithm construction flow in an embodiment of the present application;

FIG. 4 is a schematic diagram of a knowledge graph structure in an embodiment of the present application;

FIG. 5 is a schematic diagram of a process of determining a traceability service according to an embodiment of the present application;

fig. 6 is a schematic diagram of a detection apparatus for abnormal services provided according to an embodiment of the present application;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.

For convenience of description, the following will describe some terms or terminology involved in the embodiments of the present application:

unsupervised algorithm: an unsupervised algorithm is a machine learning technique that aims to model and analyze data without predefining tags or target variables.

Knowledge graph: the knowledge graph technology is to convert data into knowledge representation of a network topological graph, and form an extensible and coherent knowledge network through links among information such as entities, relations and the like.

Intelligent service fault location: intelligent service fault location is a method for quickly locating application and service faults using data and artificial intelligence techniques.

System services refer to a background program or process running in a computer operating system for providing system-level functionality and services. Moreover, system services are typically automatically launched at operating system start-up and run in the background, providing various functions and services to support the normal operation of users and applications. And the user may manage and configure these services through a system configuration tool or command line interface.

The K-Means algorithm is a common clustering algorithm used to divide a set of data into K non-overlapping clusters. The basic idea of the algorithm is to continuously optimize the central points of the clusters in an iterative manner, and assign each data point to the cluster to which the closest central point belongs.

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm is a Density-based spatial clustering algorithm. It marks data points with insufficient density as noise points by dividing the data space into clusters of densely connected data points.

Isolation Forest is an anomaly detection algorithm based on ensemble learning for identifying outlier data points. Moreover, isolation Forest is an algorithm for anomaly detection that isolates outliers by randomly partitioning the data to construct a tree.

The present application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for detecting an abnormal service provided according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:

step S101, M first services in a target system are determined, and a target data set of each first service is obtained, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1.

For example, performance data of each service (each service of the M first services) in the monitored system (the target system) may be acquired first, and the performance data may be data of a maximum response time, an average response time, a total call amount, a call success number, a call failure number, a concurrency number, and the like of the service per minute.

Step S102, inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in a target system, and S is a positive integer greater than 1.

For example, the performance data of each service (the first service) in the acquired system (the target system) may be input to a prediction model (the target prediction model) and a plurality of abnormal services (the S first services) in the system (the target system) may be predicted. Moreover, the predictive model (the target predictive model described above) may be trained and built using an unsupervised learning algorithm. And the unsupervised learning algorithm can be a K-Means algorithm, a DBSCAN algorithm, an Isolation Forest algorithm and the like.

Step S103, a knowledge graph is obtained, wherein the knowledge graph at least comprises information of M first services and calling relations between every two first services in the M first services.

For example, a service call link knowledge graph (knowledge graph described above) may be constructed based on service call relationship data in a system (target system described above). Further, the entity in the service invocation link knowledge graph (knowledge graph described above) may include a service, and the entity attribute includes, but is not limited to, a park to which the service belongs, a cluster to which the service belongs, etc., the relationship may include a service invocation relationship, and the relationship attribute may include a service bar line identifier to which the invocation relationship belongs.

Step S104, determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer.

For example, the root anomalous service node (the target service described above) may be found based on a plurality of anomalous services (the S first services described above) predicted by the prediction model (the target prediction model described above), and the graph query traversal algorithm may be invoked, that is, the link knowledge graph (the knowledge graph described above) may be invoked using the constructed services. For example, specific logic when searching for the source abnormal service node (the target service) may be to start from an abnormal alarm service based on the system, traverse to a lower node, where the lower node is an abnormal service (a service in the S first services), continue traversing, where the lower node is a normal node (the second service), and determine that the lower node is a source fault node.

It should be noted that, the method for detecting abnormal services provided by the embodiment of the application can be applied to financial scenes.

Through the steps S101 to S104, the performance data set of each service in the system to be detected is input into the prediction model to perform prediction processing, so that a plurality of services with faults in the system to be detected are predicted, then a knowledge graph is obtained, the knowledge graph includes information of the plurality of services in the system to be detected and a calling relationship between every two services in the plurality of services, then a root-cause abnormal service with faults in the system is determined according to the plurality of services with faults in the system and the knowledge graph, and the abnormal service with faults in the system is not required to be manually checked, so that efficiency and accuracy of the abnormal service with faults in the checking system are improved, and further the effect of the abnormal service with faults in the checking system is improved.

Optionally, in the method for detecting abnormal services provided in the embodiment of the present application, inputting the target data set of each first service into the target prediction model for prediction processing, and obtaining the prediction result includes: preprocessing data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment; and clustering the preprocessed data by combining an unsupervised learning algorithm to obtain a prediction result, wherein the prediction result at least comprises S first services with abnormality in a target system.

For example, in predicting abnormal services in a system through a model, all the fault services may be identified based on the monitoring data (the target data set of each first service described above), using an unsupervised learning algorithm such as a K-Means algorithm, a DBSCAN algorithm, an Isolation Forest algorithm, and the like. The method specifically comprises the following steps:

(1) Data preparation: all data is prepared and processed into the format required by the model. The data includes, but is not limited to, information such as maximum response time, average response time, total call volume, call success number, call failure number, concurrency number of services per minute in a period of time before abnormal services;

(2) And (3) feature processing: aggregating service index data in a certain time period according to minutes, and performing null filling and normalization processing on the data;

(3) And calling an unsupervised algorithm to cluster the services. Based on abnormal service alarm of the system, selecting the service output of the same class as the abnormal service. Outputting abnormal service, abnormal service probability and abnormal service index list.

In summary, by using the prediction model constructed by the unsupervised learning algorithm, the abnormal service with the fault in the system can be rapidly and accurately predicted.

Optionally, in the method for detecting abnormal services provided in the embodiment of the present application, if preprocessing is aggregation processing, preprocessing data in a target data set of each first service to obtain preprocessed data includes: acquiring preset time period information; and carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain preprocessed data.

For example, after performance data of each service in the system is collected in time per second, the collected performance data of each service in the system may be aggregated in time units per minute when the performance data of each service in the system is preprocessed. That is, after collecting performance data of each service in the system, service index data of a certain period of time may be aggregated in terms of minutes.

By the scheme, the collected service performance data can be conveniently aggregated.

Optionally, in the method for detecting an abnormal service provided by the embodiment of the present application, the target prediction model is obtained by: acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of a target system acquired in a history process, and Q is a positive integer greater than 1; based on an unsupervised learning algorithm, performing unsupervised training on the original prediction model by using a sample training set to obtain a target prediction model.

For example, before using a predictive model (target predictive model described above) to predict abnormal services in a system (target system described above), performance data for each service may be collected first, and the collected data may be used as a sample training set, and then the original predictive model may be unsupervised trained using an unsupervised learning algorithm and the sample training set, and a trained predictive model (target predictive model described above) may be obtained.

In summary, by using the unsupervised learning algorithm, a prediction model can be quickly and accurately constructed, and abnormal services with faults in the system can be predicted by using the prediction model.

Optionally, in the method for detecting an abnormal service provided by the embodiment of the present application, acquiring a knowledge graph includes: acquiring attribute information of M first services; determining a calling relation between every two first services in the M first services; and acquiring a knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

For example, a service call link knowledge graph (knowledge graph described above) may be constructed based on call relationship data of services in a system (target system described above). Further, the graph includes, but is not limited to, service nodes, call relationships between services, service call links to which the service belongs, and the like. And updating the profile when a service or service invocation link changes. Specifically, the entities in the map include services, the entity attributes include, but are not limited to, a park to which the services belong, a cluster to which the services belong, the relationships include service calling relationships, and the relationship attributes include service line identifiers to which the calling relationships belong.

By the scheme, the service call link knowledge graph can be quickly and accurately constructed.

Optionally, in the method for detecting an abnormal service provided by the embodiment of the present application, determining, according to S first services and a knowledge graph, a target service in the S first services includes: traversing each first service in the S first services according to the knowledge graph, determining whether the service called by each first service in the S first services is a second service, and obtaining S traversing results; and if the S traversal results indicate that the service calling the second service exists in the S first services, the service calling the second service in the S first services is used as the target service.

For example, a graph query traversal algorithm may be invoked to find the root exception service node based on the exception service inventory output by the prediction model (the target prediction model described above). The specific logic can be that based on the abnormal alarm service of the system, traversing is conducted to the lower node, if the lower node is abnormal service, the traversing is continued, and if the lower node is normal, the lower node is judged to be the source fault node. And in the process of judging, the identification of the service line to which the service line belongs needs to be limited as a designated service strip line.

By the scheme, the root fault service in the system can be rapidly and accurately determined.

Optionally, in the method for detecting abnormal services provided in the embodiment of the present application, if the prediction result further includes abnormal probability and abnormal performance index data corresponding to each of the S first services, after determining the target service in the S first services according to the S first services and the knowledge graph, the method further includes: acquiring attribute information of a target service; summarizing attribute information of the target service, abnormal probability corresponding to each first service in the S first services and abnormal performance index data to obtain a data information set, and sending the data information set to the target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

For example, after the source abnormal service node is found, the abnormal service root node entity, and the cluster and park to which the entity belongs (the attribute information of the target service described above) may be output according to the service call link knowledge graph (the knowledge graph described above). The results may then be integrated, that is, the output source abnormal service, the cluster to which the abnormal service belongs, the park to which the abnormal service belongs, the probability of the abnormal service (the probability of abnormality corresponding to each first service described above) output by the prediction model (the target prediction model described above), and the index information of the abnormal service (the performance index data of abnormality corresponding to each first service described above) output by the prediction model (the target prediction model described above) may be integrated together, and the data information set described above may be obtained. The data information set can then be sent to an operation and maintenance person (the target object) and prompt the operation and maintenance person to directly contact the service related person for troubleshooting based on the output data information set.

By the scheme, the information of the abnormal service in the system can be conveniently sent to operation and maintenance personnel, so that the abnormal service in the system can be rapidly and accurately processed.

Moreover, intelligent operation and maintenance is an operation and maintenance mode which utilizes advanced technical means to improve the reliability and the high efficiency of the system. The machine learning and knowledge graph technology is used as an important supporting technology for intelligent operation and maintenance, and can effectively assist in automatic operation and maintenance.

In addition, the non-supervision algorithm can establish an abnormality detection and prediction model based on the system history index expression data, so as to realize automatic monitoring and fault diagnosis of the system. The knowledge graph technology can open up system data of various channel sources to form a knowledge base of full knowledge, and provides solid technical support for rapid fault positioning.

For example, fig. 2 is a flowchart of an intelligent service fault positioning method based on an unsupervised algorithm and a knowledge graph, as shown in fig. 2, according to an embodiment of the present application, the intelligent service fault positioning method based on the unsupervised algorithm and the knowledge graph includes the following steps:

step one: and (5) model estimation. Based on the monitored data, all fault services are identified using unsupervised learning algorithms, such as K-Means algorithm, DBSCAN algorithm, and Isolation Forest algorithm. For example, fig. 3 is a schematic diagram of an unsupervised algorithm building flow in an embodiment of the present application, and as shown in fig. 3, the unsupervised algorithm building flow may specifically be:

(1) Data preparation: all data is prepared and processed into the format required by the model. The data includes, but is not limited to, information such as maximum response time per minute, average response time, total call volume, call success number, call failure number, concurrency number, etc. of the service in a period of time before the service abnormality.

(2) And (3) feature processing: and aggregating the service index data in a certain time period according to minutes, and performing null filling and normalization processing on the data.

Step two: and (5) constructing a map. And constructing a service call link knowledge graph based on the service call relation data. The graph includes, but is not limited to, service nodes, call relations between services, service call links to which the service belongs, and the like. The profile is updated when a service or service invocation link changes. For example, fig. 4 is a schematic diagram of a graph structure of a knowledge graph in an embodiment of the present application, where, as shown in fig. 4, an entity in the knowledge graph includes a service, an entity attribute includes, but is not limited to, a park to which the service belongs, a cluster to which the service belongs, and the relationship includes a service calling relationship, and a relationship attribute includes a service line identifier to which the calling relationship belongs.

Step three: and tracing the source service. Based on the abnormal service list, a graph query traversing algorithm is called to find the root abnormal service node. The specific logic is that based on the abnormal alarm service of the system, traversing is performed to the lower node, if the lower node is abnormal service, continuing traversing, if the lower node is normal node, or if the lower node is not a callable node, namely, the node is the last node, if the node does not call any node, judging the node as a source fault node, and in the judging process, limiting the identification of the service line to be a designated service line is needed. And finally outputting the abnormal service root node entity, and the cluster and park to which the entity belongs. For example, fig. 5 is a schematic diagram of a service flow determining tracing in the embodiment of the present application, as shown in fig. 5, where the service in the solid circle represents an abnormal service, the service in the dotted circle represents a normal service, the tracing link is a-b-e-f-g (park a, cluster a), and the service g in the tracing link is the last node in the knowledge graph, and has no lower node, that is, it does not invoke any node, and it may be determined that the final traced root fault service result is the service g. In addition, the service g in the solid circle and the service g in the broken circle in fig. 5 may represent services in different parks, for example, as shown in fig. 5, the park to which the service g in the solid circle belongs is an a-park, and the park to which the service g in the broken circle belongs may be a b-park.

Step four: integrating the results. And (3) combining the results of the step one and the step three, and outputting the root exception service, the cluster to which the exception service belongs, the park to which the exception service belongs, the probability of the exception service and the index information of the exception service. Based on the output, the operation and maintenance personnel can directly contact the service related personnel to conduct fault troubleshooting.

Therefore, by the method provided by the embodiment of the application, on one hand, a user can be helped to locate the root fault service in time, the problem investigation of the root fault service is more focused, the MTTR (mean time to repair) is reduced, and on the other hand, the problem is more quickly and accurately located and solved based on the reference information output by the model than the simple manual investigation, and the influence of the fault on the service is reduced.

In summary, in the method for detecting abnormal services provided by the embodiment of the present application, by determining M first services in a target system, and acquiring a target data set of each first service, where the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set includes at least N target data, the target data includes performance data of the first services, and M and N are positive integers greater than 1; inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the prediction result at least comprises S first services with anomalies in a target system, the target prediction model is a model constructed based on an unsupervised learning algorithm, and S is a positive integer greater than 1; acquiring a knowledge graph, wherein the knowledge graph at least comprises information of M first services and calling relations between every two first services in the M first services; determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer, so that the problem that in the related art, the effect of troubleshooting abnormal services with faults in a system is poor due to the fact that the abnormal services with faults are troubleshooted in a manual mode is solved. The method comprises the steps that the performance data set of each service in a system to be detected is input into a prediction model to be subjected to prediction processing, a plurality of services with faults in the system to be detected are predicted, a knowledge graph is obtained, the knowledge graph comprises information of the plurality of services in the system to be detected and a calling relation between every two services in the plurality of services, then the source abnormal service with the faults in the system is determined according to the plurality of services with the faults in the system and the knowledge graph, the abnormal service with the faults in the system is not required to be manually checked, and therefore efficiency and accuracy of the abnormal service with the faults in the checking system are improved, and further effects of the abnormal service with the faults in the checking system are improved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application also provides a device for detecting the abnormal service, and the device for detecting the abnormal service can be used for executing the method for detecting the abnormal service provided by the embodiment of the application. The following describes a detection device for abnormal services provided by the embodiment of the present application.

Fig. 6 is a schematic diagram of a detection apparatus of an abnormal service according to an embodiment of the present application. As shown in fig. 6, the apparatus includes: a first processing unit 601, a first input unit 602, a first acquisition unit 603, and a first determination unit 604.

Specifically, a first processing unit 601 is configured to determine M first services in a target system, and obtain a target data set of each first service, where the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set includes at least N target data, the target data includes performance data of the first services, and M and N are positive integers greater than 1;

The first input unit 602 is configured to input the target data set of each first service into a target prediction model for prediction processing, so as to obtain a prediction result, where the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least includes S first services having an anomaly in a target system, and S is a positive integer greater than 1;

a first obtaining unit 603, configured to obtain a knowledge graph, where the knowledge graph at least includes information of M first services and a calling relationship between every two first services in the M first services;

the first determining unit 604 is configured to determine a target service in the S first services according to the S first services and the knowledge graph, where the target service is a service that invokes T second services, and the second service is a service other than the S first services in the M first services, and T is a positive integer.

In summary, in the detection device for abnormal services provided by the embodiment of the present application, M first services in a target system are determined by using a first processing unit 601, and a target data set of each first service is obtained, where the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set includes at least N target data, the target data includes performance data of the first services, and M and N are positive integers greater than 1; the first input unit 602 inputs the target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in a target system, and S is a positive integer greater than 1; the first obtaining unit 603 obtains a knowledge graph, where the knowledge graph at least includes information of M first services and a calling relationship between every two first services in the M first services; the first determining unit 604 determines a target service in the S first services according to the S first services and the knowledge graph, where the target service is a service that invokes T second services, and the second services are services other than the S first services in the M first services, and T is a positive integer, so that the problem in the related art that the effect of troubleshooting the abnormal service with the fault is poor due to the abnormal service with the fault is solved by manually troubleshooting the abnormal service with the fault in the system. The method comprises the steps that the performance data set of each service in a system to be detected is input into a prediction model to be subjected to prediction processing, a plurality of services with faults in the system to be detected are predicted, a knowledge graph is obtained, the knowledge graph comprises information of the plurality of services in the system to be detected and a calling relation between every two services in the plurality of services, then the source abnormal service with the faults in the system is determined according to the plurality of services with the faults in the system and the knowledge graph, the abnormal service with the faults in the system is not required to be manually checked, and therefore efficiency and accuracy of the abnormal service with the faults in the checking system are improved, and further effects of the abnormal service with the faults in the checking system are improved.

Optionally, in the detection apparatus for abnormal services provided in the embodiment of the present application, the first determining unit includes: the first processing module is used for performing traversal processing on each first service in the S first services according to the knowledge graph, determining whether the service called by each first service in the S first services is a second service or not, and obtaining S traversal results; and the first determining module is used for taking the service calling the second service in the S first services as the target service if the S traversal results indicate that the service calling the second service exists in the S first services.

Optionally, in the detection device for abnormal services provided in the embodiment of the present application, the first input unit includes: the second processing module is used for preprocessing the data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following steps: aggregation treatment, null value filling treatment and normalization treatment; and the third processing module is used for clustering the preprocessed data by combining an unsupervised learning algorithm to obtain a predicted result, wherein the predicted result at least comprises S first services with abnormality in a target system.

Optionally, in the device for detecting abnormal services provided in the embodiment of the present application, if the preprocessing is aggregation processing, the second processing module includes: the first acquisition submodule is used for acquiring preset time period information; the first processing sub-module is used for carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain preprocessed data.

Optionally, in the detection device for abnormal services provided by the embodiment of the present application, the target prediction model is obtained by: the second acquisition unit is used for acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of a target system acquired in a history process, and Q is a positive integer greater than 1; the first training unit is used for performing unsupervised training on the original prediction model by using the sample training set based on an unsupervised learning algorithm to obtain a target prediction model.

Optionally, in the detection apparatus for abnormal services provided in the embodiment of the present application, the first acquisition unit includes: the first acquisition module is used for acquiring attribute information of M first services; the second determining module is used for determining the calling relation between every two first services in the M first services; and the second acquisition module is used for acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

Optionally, in the device for detecting abnormal services provided in the embodiment of the present application, if the prediction result further includes abnormal probability and abnormal performance index data corresponding to each of the S first services, the device further includes: the third acquisition unit is used for acquiring attribute information of target services after determining the target services in the S first services according to the S first services and the knowledge graph; the second processing unit is used for summarizing the attribute information of the target service, the abnormal probability corresponding to each first service in the S first services and the abnormal performance index data to obtain a data information set, and sending the data information set to the target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

The detection device for abnormal services includes a processor and a memory, where the first processing unit 601, the first input unit 602, the first obtaining unit 603, the first determining unit 604, and the like are stored as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the effect of troubleshooting abnormal services with faults in the system is improved by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a method of detecting an abnormal service.

The embodiment of the invention provides a processor which is used for running a program, wherein the program runs to execute the detection method of the abnormal service.

As shown in fig. 7, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: determining M first services in a target system, and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1; inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1; acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and a calling relationship between every two first services in the M first services; and determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer.

The processor also realizes the following steps when executing the program: determining the target service in the S first services according to the S first services and the knowledge graph comprises: performing traversal processing on each first service in the S first services according to the knowledge graph, and determining whether the service called by each first service in the S first services is the second service or not to obtain S traversal results; and if the S traversal results indicate that the service calling the second service exists in the S first services, the service calling the second service in the S first services is used as the target service.

The processor also realizes the following steps when executing the program: inputting the target data set of each first service into a target prediction model for prediction processing, and obtaining a prediction result comprises: preprocessing data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment; and clustering the preprocessed data by combining the unsupervised learning algorithm to obtain the prediction result, wherein the prediction result at least comprises the S first services with the abnormality in the target system.

The processor also realizes the following steps when executing the program: if the preprocessing is aggregation processing, preprocessing the data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing comprises the following steps: acquiring preset time period information; and carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain the preprocessed data.

The processor also realizes the following steps when executing the program: the target prediction model is obtained by the following steps: acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of the target system acquired in a history process, and Q is a positive integer greater than 1; and performing unsupervised training on the original prediction model by using the sample training set based on the unsupervised learning algorithm to obtain the target prediction model.

The processor also realizes the following steps when executing the program: the obtaining of the knowledge graph comprises the following steps: acquiring attribute information of the M first services; determining a calling relationship between every two first services in the M first services; and acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

The processor also realizes the following steps when executing the program: if the prediction result further includes abnormal probability and abnormal performance index data corresponding to each of the S first services, after determining the target service in the S first services according to the S first services and the knowledge graph, the method further includes: acquiring attribute information of the target service; summarizing attribute information of the target service, abnormal probability corresponding to each first service in the S first services and abnormal performance index data to obtain a data information set, and sending the data information set to a target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

The device herein may be a server, PC, PAD, cell phone, etc.

The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: determining M first services in a target system, and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1; inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1; acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and a calling relationship between every two first services in the M first services; and determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: determining the target service in the S first services according to the S first services and the knowledge graph comprises: performing traversal processing on each first service in the S first services according to the knowledge graph, and determining whether the service called by each first service in the S first services is the second service or not to obtain S traversal results; and if the S traversal results indicate that the service calling the second service exists in the S first services, the service calling the second service in the S first services is used as the target service.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: inputting the target data set of each first service into a target prediction model for prediction processing, and obtaining a prediction result comprises: preprocessing data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment; and clustering the preprocessed data by combining the unsupervised learning algorithm to obtain the prediction result, wherein the prediction result at least comprises the S first services with the abnormality in the target system.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: if the preprocessing is aggregation processing, preprocessing the data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing comprises the following steps: acquiring preset time period information; and carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain the preprocessed data.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the target prediction model is obtained by the following steps: acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of the target system acquired in a history process, and Q is a positive integer greater than 1; and performing unsupervised training on the original prediction model by using the sample training set based on the unsupervised learning algorithm to obtain the target prediction model.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: the obtaining of the knowledge graph comprises the following steps: acquiring attribute information of the M first services; determining a calling relationship between every two first services in the M first services; and acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

When executed on a data processing device, is further adapted to carry out a program initialized with the method steps of: if the prediction result further includes abnormal probability and abnormal performance index data corresponding to each of the S first services, after determining the target service in the S first services according to the S first services and the knowledge graph, the method further includes: acquiring attribute information of the target service; summarizing attribute information of the target service, abnormal probability corresponding to each first service in the S first services and abnormal performance index data to obtain a data information set, and sending the data information set to a target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for detecting an abnormal service, comprising:

determining M first services in a target system, and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1;

inputting a target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1;

acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and a calling relationship between every two first services in the M first services;

and determining target services in the S first services according to the S first services and the knowledge graph, wherein the target services are services for calling T second services, the second services are services except the S first services in the M first services, and T is a positive integer.

2. The method of claim 1, wherein determining a target service of the S first services from the S first services and the knowledge-graph comprises:

performing traversal processing on each first service in the S first services according to the knowledge graph, and determining whether the service called by each first service in the S first services is the second service or not to obtain S traversal results;

and if the S traversal results indicate that the service calling the second service exists in the S first services, the service calling the second service in the S first services is used as the target service.

3. The method of claim 1, wherein inputting the target data set for each first service into the target prediction model for prediction processing, and obtaining the prediction result comprises:

preprocessing data in the target data set of each first service to obtain preprocessed data, wherein the preprocessing is at least one of the following: aggregation treatment, null value filling treatment and normalization treatment;

and clustering the preprocessed data by combining the unsupervised learning algorithm to obtain the prediction result, wherein the prediction result at least comprises the S first services with the abnormality in the target system.

4. A method according to claim 3, wherein if the preprocessing is aggregation processing, preprocessing the data in the target data set of each first service to obtain preprocessed data comprises:

acquiring preset time period information;

and carrying out aggregation processing on the data in the target data set of each first service according to the preset time period information to obtain the preprocessed data.

5. The method of claim 1, wherein the target prediction model is obtained by:

acquiring a sample training set, wherein the sample training set at least comprises Q sample data, the sample data comprises performance data of the target system acquired in a history process, and Q is a positive integer greater than 1;

and performing unsupervised training on the original prediction model by using the sample training set based on the unsupervised learning algorithm to obtain the target prediction model.

6. The method of claim 1, wherein obtaining a knowledge-graph comprises:

acquiring attribute information of the M first services;

determining a calling relationship between every two first services in the M first services;

And acquiring the knowledge graph based on the attribute information of the M first services and the calling relation between every two first services.

7. The method of claim 1, wherein if the prediction result further includes anomaly probability and anomaly performance index data corresponding to each of the S first services, after determining the target service in the S first services according to the S first services and the knowledge graph, the method further includes:

acquiring attribute information of the target service;

summarizing attribute information of the target service, abnormal probability corresponding to each first service in the S first services and abnormal performance index data to obtain a data information set, and sending the data information set to a target object, wherein the data information set is used for prompting the target object to process the S first services with the abnormality in the target system according to the data information set.

8. An abnormal service detection apparatus, comprising:

the first processing unit is used for determining M first services in a target system and acquiring a target data set of each first service, wherein the target system is a system to be detected, the first services are background programs or processes running in the target system, the target data set at least comprises N target data, the target data comprise performance data of the first services, and M and N are positive integers greater than 1;

The first input unit is used for inputting the target data set of each first service into a target prediction model for prediction processing to obtain a prediction result, wherein the target prediction model is a model constructed based on an unsupervised learning algorithm, and the prediction result at least comprises S first services with anomalies in the target system, and S is a positive integer greater than 1;

the first acquisition unit is used for acquiring a knowledge graph, wherein the knowledge graph at least comprises information of the M first services and calling relations between every two first services in the M first services;

the first determining unit is configured to determine a target service in the S first services according to the S first services and the knowledge graph, where the target service is a service that invokes T second services, the second services are services other than the S first services in the M first services, and T is a positive integer.

9. A computer-readable storage medium storing a program, wherein the program executes the method for detecting an abnormal service according to any one of claims 1 to 7.

10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of detecting an abnormal service of any of claims 1-7.