CN117056166A - Data anomaly detection method and device, storage medium and electronic equipment - Google Patents

Data anomaly detection method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117056166A
CN117056166A CN202311031839.7A CN202311031839A CN117056166A CN 117056166 A CN117056166 A CN 117056166A CN 202311031839 A CN202311031839 A CN 202311031839A CN 117056166 A CN117056166 A CN 117056166A
Authority
CN
China
Prior art keywords
data
service
determining
time sequence
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311031839.7A
Other languages
Chinese (zh)
Inventor
黄�俊
杨阳
余航
李建国
郑啸
刘向阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311031839.7A priority Critical patent/CN117056166A/en
Publication of CN117056166A publication Critical patent/CN117056166A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Debugging And Monitoring (AREA)

Abstract

One or more embodiments of the present disclosure provide a method, an apparatus, a storage medium, and an electronic device for detecting data anomalies, which relate to the technical field of data processing. The method comprises the following steps: determining index time sequence data, log time sequence data and call chain time sequence data in a target micro-service system, wherein the target micro-service system comprises a plurality of micro-services; determining a plurality of service instances corresponding to the plurality of micro services based on the index time sequence data and the log time sequence data; determining a service call relationship between a plurality of service instances based on the call chain timing data; generating a plurality of directed graphs with time sequence relations based on the plurality of service instances and service calling relations among the plurality of service instances; and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs.

Description

Data anomaly detection method and device, storage medium and electronic equipment
Technical Field
One or more embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a method and apparatus for detecting data anomalies, a storage medium, and an electronic device.
Background
With the development of cloud computing, a software system architecture is gradually changed from a single architecture to a service-oriented architecture, and the service-oriented architecture can adapt to the rapid increase of the system scale and has higher iteration speed, lower development complexity and better expandability. However, its deployment and operation complexity is greatly increased, which challenges operation and maintenance such as fault detection and diagnosis.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a data anomaly detection method, apparatus, storage medium, and electronic device.
In a first aspect, one or more embodiments of the present disclosure provide a data anomaly detection method, including: determining index time sequence data, log time sequence data and call chain time sequence data in a target micro-service system, wherein the target micro-service system comprises a plurality of micro-services; determining a plurality of service instances corresponding to the plurality of micro services based on the index time sequence data and the log time sequence data; determining a service call relationship between a plurality of service instances based on the call chain timing data; generating a plurality of directed graphs with time sequence relations based on the plurality of service instances and service calling relations among the plurality of service instances; and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs.
With reference to the first aspect, in some implementations, determining, based on the multiple directed graphs, a data anomaly detection result corresponding to the target micro-service system includes: determining spatial dependency information between a plurality of service instances based on the plurality of directed graphs; determining time dependency information between the indicator timing data, the log timing data, and the call chain timing data based on the plurality of directed graphs; and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs, the space dependence information and the time dependence information.
With reference to the first aspect, in some implementations, determining a data anomaly detection result corresponding to the target micro-service system based on the multiple directed graphs, the spatial dependency information, and the time dependency information includes: updating the plurality of directed graphs based on the space dependence information and the time dependence information to obtain feature data corresponding to the plurality of service instances and prediction errors of the feature data corresponding to the service calling relations; based on the space dependence information and the time dependence information, updating part of the directed graphs in the plurality of directed graphs to obtain feature data corresponding to a plurality of service instances and reconstruction errors of the feature data corresponding to the service calling relations; and determining a data abnormality detection result corresponding to the target micro-service system based on the reconstruction error and the prediction error.
With reference to the first aspect, in some implementations, the time dependent information is obtained through a time attention network of the mask when determining the reconstruction error.
With reference to the first aspect, in some implementations, determining spatial dependency information between a plurality of service instances based on a plurality of directed graphs includes: based on the plurality of directed graphs and the graph meaning network, spatial dependency information between the plurality of service instances is determined.
With reference to the first aspect, in some implementations, determining the indicator timing data, the log timing data, and the call chain timing data in the target micro-service system includes: acquiring index data, log data and call chain data in a target micro-service system; and respectively carrying out structural processing and time serialization processing on the index data, the log data and the call chain data in the target micro-service system to obtain index time sequence data, log time sequence data and call chain time sequence data in the target micro-service system.
With reference to the first aspect, in some implementations, vertex features of the directed graph are determined based on the metric timing data and the log timing data, and edge features of the directed graph are determined based on the call chain timing data.
In a second aspect, one or more embodiments of the present disclosure provide a data anomaly detection apparatus, including: the first determining module is used for determining index time sequence data, log time sequence data and call chain time sequence data in a target micro-service system, wherein the target micro-service system comprises a plurality of micro-services; the second determining module is used for determining a plurality of service instances corresponding to the plurality of micro services based on the index time sequence data and the log time sequence data; the third determining module is used for determining service calling relations among the plurality of service instances based on the calling chain time sequence data; the generation module is used for generating a plurality of directed graphs with time sequence relations based on the service instances and the service calling relations among the service instances; and the fourth determining module is used for determining the data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs.
In a third aspect, one or more embodiments of the present description provide a computer-readable storage medium storing a computer program for performing the method mentioned in the first aspect.
In a fourth aspect, one or more embodiments of the present description provide an electronic device comprising: a processor; a memory for storing processor-executable instructions; the processor is adapted to perform the method mentioned in the first aspect.
In a fifth aspect, one or more embodiments of the present description provide a computer program product comprising instructions which, when executed, enable the method mentioned in the first aspect to be carried out.
In the embodiment of the specification, the index time sequence data, the log time sequence data and the call chain time sequence data are utilized to carry out abnormality judgment, and the accuracy rate of data abnormality detection can be improved through different sensitivity and judgment of various data sources on abnormality. In addition, the embodiment of the specification integrates and displays the index time sequence data, the log time sequence data and the call chain time sequence data by utilizing the directed graph, and can be used as a virtual representation of a micro-service system of the physical world, so that the reliability and the safety of a physical corresponding model are improved. Meanwhile, the directed graph of the graph structure is fixed, so that the vertex characteristics of the graph and the edge characteristics of the graph are mainly changed in the directed graph, and therefore, various anomaly detection algorithms can be adopted for data anomaly detection, the limitation of a data anomaly detection method is reduced, and the universality of the data anomaly detection method in the embodiment of the specification is improved.
Drawings
The above and other objects, features and advantages of the embodiments of the present specification will become more apparent by describing the embodiments of the present specification in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the present specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the present specification and together with the embodiments of the specification, not constitute a limitation to the embodiments of the present specification. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a schematic diagram of an implementation environment related to a data anomaly detection method.
Fig. 2 is a flowchart illustrating a method for detecting data anomalies according to an exemplary embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating the construction of a directed graph according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic flow chart of determining a data anomaly detection result according to an exemplary embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating a process of determining a data anomaly detection result according to another exemplary embodiment of the present disclosure.
Fig. 6a is a schematic diagram illustrating an update service example according to an exemplary embodiment of the present disclosure.
FIG. 6b is a diagram illustrating an update call relationship according to an exemplary embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of a timing anomaly detection model according to an exemplary embodiment of the present disclosure.
Fig. 8 is a block diagram of a time attention network according to an exemplary embodiment of the present disclosure.
Fig. 9 is a schematic flow chart of determining time series data according to an exemplary embodiment of the present disclosure.
Fig. 10 is a schematic diagram showing a structure of a data anomaly detection apparatus according to an exemplary embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
Summary of the application
With the development of cloud computing, the architecture of a software system is gradually changed from a single architecture to a service-oriented architecture. The traditional monomer architecture has high application compactness and strong dependence, and is difficult to test and upgrade. The service-oriented architecture is different from the traditional single architecture, is an architecture generated for adapting to the high concurrency, high performance and high availability of the current internet background service, and can be deployed on the cloud. The service-oriented architecture can adapt to the rapid increase of the system scale, has quicker iteration speed, lower development complexity and better expandability, but greatly increases the deployment and operation and maintenance complexity, and challenges the operation and maintenance works such as fault detection, diagnosis and the like.
Anomaly detection refers to the finding of unexpected behavior patterns in data, which is a very important sub-branch in machine learning. The existing operation and maintenance system is low in automation degree, long in abnormal processing time, high in labor cost and highly dependent on expert experience, a large number of abnormal conditions are processed manually, and abnormal detection of a complex system is difficult to deal with. While intelligent operation and maintenance, which combines big data and machine learning to automate IT (Information Technology ) operation flows, including event correlation, anomaly detection, and causal relationship determination, has become an essential tool for monitoring and managing mixed, distributed, and componentized modern IT environments. Based on the operation and maintenance big data, the intelligent application scenes such as anomaly detection, fault prediction, root cause analysis, capacity prediction and the like are realized by utilizing an artificial intelligent learning algorithm, so that the operation and maintenance capability can be comprehensively improved, and the operation and maintenance cost is reduced.
Next, a common data anomaly detection method is analyzed.
The SCWarn anomaly detection method uses metrics and logs for anomaly detection. The index and the log are time sequence data, and LSTM (Long Short-Term Memory) is used for time-dependent learning on different data sources to obtain new data expression. And carrying out multi-modal representation fusion by using the full connection layer to obtain the anomaly score of each data source. However, this approach is susceptible to data drift.
The HADES anomaly detection method uses metrics and logs for anomaly detection. Considering the correlation among multiple indexes, the indexes are grouped, 1-dimensional causal convolution is used for extracting similar index features, and then the index features of different groups are fused to obtain index feature representation. Log feature representation is performed on the log using Transformer Encoder. For the index feature representation and the log feature representation, the Cross attribute mechanism is adopted for updating. And finally, calculating the updated two characteristic representations by using the full connection layer to obtain the abnormality judgment.
Both SCWarn and HADES use metrics and logs as multi-modal data, but the two are handled differently for data. First, SCWarn uses LSTM to capture the time dependence of data changes within a sliding window, while HADES is a feature extraction of data within a sliding window, describes timeliness for an index using causal convolution, and journals do not highlight time dependence. Secondly, the two are also different in multi-source data fusion, SCWarns are directly represented by a full connection layer, and HADES uses Cross attribute to interact with multi-source data.
However, the foregoing anomaly detection method still belongs to a single architecture, and thus the anomaly impact between service instances cannot be well determined. Furthermore, SCWarn, after obtaining a characteristic representation of the multi-source data, does not take into account differences between the multi-source data, directly splices, and lacks interactions between the multi-source data. The HADES adopts a semi-supervision strategy, trains a model by using marked information, marks unknown marked data by using the model, and trains the model by combining the loss of known marked data and the loss of unmarked data. However, the new data is marked by the model, and the deviation is unavoidable.
Exemplary scenario
Fig. 1 is a schematic diagram of an implementation environment related to a data anomaly detection method. As shown in fig. 1, the implementation environment includes a server 10 and an electronic device 20. The electronic device 20 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, etc., and the server 10 may be a stand-alone server or may be a server cluster formed by a plurality of servers. More specifically, the electronic device 20 may be a data analysis device for network services or a data analysis device for non-network services. The electronic device 20 specifies the target data and transmits a data abnormality detection request to the server 10. The server 10 performs anomaly detection on the target data, and returns the detection result to the electronic device 20 for display to the user or the operation and maintenance personnel.
The electronic device 20 is an analysis device for daily transaction amount, page View amount PV (Page View), independent visitor amount UV (Unique Visitor), or an access amount analysis device of a mobile application APP (Application), and may be an analysis device for data such as server CPU (Central Processing Unit) load of a large supermarket, daily sales of a commodity, and the like. In the embodiment of the present disclosure, the anomaly detection result of the specified time series data of each electronic device 20 may be displayed to the operation and maintenance personnel in a graph manner, and the result may be used as a basis for subsequent anomaly processing.
In one example, the user specifies time series data (including index time series data, log time series data, and call chain time series data) to be detected through the electronic device 20, and requests abnormality detection of the data from the server 10, for example, the user specifies that the data name is "daily transaction amount of the electronic commerce platform". In this case, the server 10 acquires these data from the corresponding data sources, and then, based on these history data, determines whether the data at the latest time point is an abnormal point. The data source may be a data source located in other service ends, such as a server end of an e-commerce platform.
In another example, a user specifies time series data comprised of a series of data points through electronic device 20, requesting abnormality detection of the time series data from server 10. In this case, the abnormality detection request includes a series of data points, and the server 10 obtains a series of data points to be detected from the abnormality detection request, performs abnormality detection processing on these data, and finds an abnormality point therein.
Exemplary method
Fig. 2 is a flowchart illustrating a method for detecting data anomalies according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the data anomaly detection method provided in the embodiment of the present specification includes the following steps.
Step S210, determining index time sequence data, log time sequence data and call chain time sequence data in the target micro-service system.
The target micro-service system includes a plurality of micro-services. The index timing data is a real-valued time series of the state of the metric system, and includes index data describing a memory, describing a network, describing a process, and a disk space, for example, the index timing data includes a response time, a thread number, a request processing time, a flow rate, and the like. The log timing data is used to record the operational status of the system. Invoking chain timing data is requesting an entire link view in a distributed system.
And acquiring multi-source heterogeneous data of the target micro-service system, wherein the multi-source heterogeneous data comprises index data, log data and call chain data, and carrying out anomaly detection analysis by utilizing different sensitivity and judgment capability of each data source in the multi-source heterogeneous data to anomaly and fusing each data source. Further, preprocessing is carried out on the multi-source heterogeneous data to obtain numerical index time sequence data, log time sequence data and call chain time sequence data.
Step S220, determining a plurality of service instances corresponding to the plurality of micro services based on the index time series data and the log time series data.
Both the index timing data and the log timing data are descriptions and records of service instances. And determining the record object described by the index time sequence data and the log time sequence data according to the information contained in the index time sequence data and the log time sequence data, and further determining a plurality of service instances corresponding to the plurality of micro services.
Step S230, based on the call chain time sequence data, determining service call relations among a plurality of service instances.
The call chain timing data, trace data, is the entire link view requested in the distributed system. SPAN represents a view between different service instances in the entire link, and Trace is a combination of SPAN. A lot of key information, such as time point, call time, record object, id of SPAN, status code, request type, id of SPAN father, id of Trace, etc. is stored in SPAN. The record objects of the call dependence and the parent SPAN between service instances can be obtained by utilizing the parent-child relationship of the SPAN in the Trace. That is, from the call chain timing data, a service call relationship between a plurality of service instances can be determined.
Step S240, based on the service instances and the service call relations among the service instances, generating a plurality of directed graphs with time sequence relations.
The directed graph in the embodiments of the present description has a fixed graph structure, i.e., the directed graph includes vertices of the graph and edges of the graph. The vertices of the directed graph are made up of service instances, and edges of the directed graph are determined based on service invocation relationships between the multiple service instances. More specifically, because the index timing data and the log timing data are essentially descriptions and records of service instances, the call chain timing data is an entire link view, so the vertices of the directed graph conform to the index timing data and the log timing data, and the edges of the directed graph conform to the call chain timing data.
Step S250, determining a data abnormality detection result corresponding to the target micro service system based on the plurality of directed graphs.
For example, for a plurality of directed graphs with a time sequence relationship, comparing the change data of the feature data belonging to the same service instance in the time dimension in each directed graph, if the change data is under the preset normal condition, the target micro-service system is considered to operate normally, otherwise, determining that the data in the target micro-service system is abnormal. Further, according to the correspondence between the vertices and edges in the directed graph and the physical entities, determining the specific object with abnormal data in the target micro-service system. For example, if a vertex in the directed graph has a data exception, it indicates that the physical entity corresponding to the vertex is abnormal, e.g., the physical entity includes a physical machine, a micro-service, a system or a container; if the edges in the directed graph are abnormal, the data is indicated to be abnormal when one micro-service is called by the other micro-service.
Anomaly detection refers to identifying rare items, events, or observations that differ significantly from most data. The data anomaly detection in the embodiments of the present specification includes time-series anomaly detection, that is, an example of data found to deviate significantly from other observed values in time series.
In the embodiment of the specification, the index time sequence data, the log time sequence data and the call chain time sequence data are utilized to carry out abnormality judgment, and the accuracy rate of data abnormality detection can be improved through different sensitivity and judgment of various data sources on abnormality. In addition, the embodiment of the specification integrates and displays the index time sequence data, the log time sequence data and the call chain time sequence data by utilizing the directed graph, and can be used as a virtual representation of a micro-service system of the physical world, so that the reliability and the safety of a physical corresponding model are improved. Meanwhile, the directed graph of the graph structure is fixed, so that the vertex characteristics of the graph and the edge characteristics of the graph are mainly changed in the directed graph, and therefore, various anomaly detection algorithms can be adopted for data anomaly detection, the limitation of a data anomaly detection method is reduced, and the universality of the data anomaly detection method in the embodiment of the specification is improved.
Fig. 3 is a schematic diagram illustrating the construction of a directed graph according to an exemplary embodiment of the present disclosure. As shown in fig. 3, SL represents a Service Instance (Service Instance). The vertex of the directed graph is composed of service examples, the edge of the directed graph is determined by judging SPAN parent-child relationship in all call chain time sequence data, and an adjacency matrix A=can be obtained{a ij },i,j={1,…,N}。
Wherein,
to sum up, a directed graph G is obtained with respect to the time point t t =<V t ,E t >, V t =M t ||L t ,E t =S t Wherein I represents that the index time sequence data and the log time sequence data are spliced in the last dimension, V t Vertex feature of directed graph representing time point t, E t Edge features of the directed graph representing time points t, so multiple directed graphs with time sequence relationships can be expressed as
In practical applications, the scheduling of service instances is highly concurrent and massive, and scheduling between service instances always exists, so the structure of the directed graph of each timestamp is almost identical and stable. To simplify the problem and speed up the detection of data anomalies, the graph structure of the directed graph for all timestamps can be repaired, which represents a ij Is fixed.
Fig. 4 is a schematic flow chart of determining a data anomaly detection result according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 4 is extended from the embodiment shown in fig. 2, and differences between the embodiment shown in fig. 4 and the embodiment shown in fig. 2 are described with emphasis, and the details of the differences are not repeated.
As shown in fig. 4, in the embodiment of the present disclosure, the data anomaly detection result corresponding to the target micro-service system is determined based on a plurality of directed graphs, including the following steps.
Step S410, based on the plurality of directed graphs, determines spatial dependency information between the plurality of service instances.
Illustratively, spatial dependency information between a plurality of service instances in a plurality of directed graphs is determined using a spatial attention network. The spatial attention network is used to model complex correlations between different data sources at the local level of the directed graph, more specifically, to update vertex features and edge features in the directed graph with a messaging mechanism on the directed graph, capture interactions between different service instances, and determine spatial dependencies between multiple service instances.
Step S420, based on the plurality of directed graphs, determines time-dependent information between the indicator timing data, the log timing data, and the call chain timing data.
In micro-service systems, the multi-source data is generated sequentially, i.e., the index time series data, the log time series data and the call connection time series data show time trends, and in order to improve the data anomaly detection performance, the time dependence of the data in the sequence can be modeled, so the embodiment of the specification adopts a time attention network to capture the time dependence among the data points in the sliding window.
Step S430, determining a data anomaly detection result corresponding to the target micro-service system based on the plurality of directed graphs, the space-dependent information and the time-dependent information.
For example, according to the multiple directed graphs, the space-dependent information and the time-dependent information, the vertex features and the edge features in the directed graphs can be updated, the vertex features and the edge features in the directed graphs before and after updating are compared, and the data anomaly detection result of the target micro-service system is determined. For example, when the comparison result shows that the directed graph has no difference before and after updating, the data abnormality detection result of the target micro-service system is normal, otherwise, the data abnormality detection result is abnormal.
With the development of cloud computing, the architecture of a software system is gradually changed from a single architecture to a service-oriented architecture. The method simply uses the anomaly detection method applied to the single architecture, ignores the mutual influence among service instances, thus constructing a directed graph according to the calling relation among the service instances, and describing the time dependence and the space dependence of the service by utilizing the space-time attention in combination with the timeliness of time sequence data (namely index time sequence data, log time sequence data and calling continuous time sequence data). That is, the long-term dependency relationship between service instances can be better captured through the space dependency information and the time dependency information, and the performance of data anomaly detection can be improved. The vertex characteristics in the directed graph are obtained according to index time sequence data and log time sequence data, the edge characteristics are obtained according to call chain time sequence data, and the generalization capability of a data anomaly detection algorithm can be improved by utilizing the complementary information of different data sources, so that the accuracy of a data anomaly detection result is improved.
Fig. 5 is a flowchart illustrating a process of determining a data anomaly detection result according to another exemplary embodiment of the present disclosure. The embodiment shown in fig. 5 is extended from the embodiment shown in fig. 4, and differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 4 are described with emphasis, and the details of the differences are not repeated.
As shown in fig. 5, in the embodiment of the present specification, the data anomaly detection result corresponding to the target micro-service system is determined based on the plurality of directed graphs, the spatial dependency information, and the time dependency information, including the following steps.
Step S510, based on the space dependence information and the time dependence information, updating the plurality of directed graphs to obtain the feature data corresponding to the plurality of service instances and the prediction errors of the feature data corresponding to the service calling relations.
Feature data corresponding to the service instance, namely vertex features in the directed graph, and feature data corresponding to the service call relationship, namely edge features in the directed graph. After updating the directed graph, vertex features and edge features are obtained that match vertex features and edge features of the directed graph before updating. And determining the prediction error by using the vertex characteristics and the edge characteristics of the directional diagram before updating and the vertex characteristics and the edge characteristics after updating.
Step S520, based on the space dependence information and the time dependence information, updating part of the directed graphs in the plurality of directed graphs to obtain the feature data corresponding to the plurality of service instances and the reconstruction errors of the feature data corresponding to the service calling relations.
And judging whether the vertex characteristics and the edge characteristics in the directed graph can be mutually reconstructed or not based on all the information in the directed graph collected at the current time point, and if so, determining a reconstruction error.
Step S530, determining the data anomaly detection result corresponding to the target micro-service system based on the reconstruction error and the prediction error.
In some implementations, if the vertex features and the edge features in the directed graph cannot be reconstructed with each other, it is determined that a data exception exists in a physical entity or call relationship corresponding to the vertex or edge that cannot be reconstructed. If the reconstruction can be performed, determining an anomaly score corresponding to the vertex or the edge in the directed graph according to the prediction error and the reconstruction error, and if the anomaly score is greater than a preset threshold, determining that the physical entity corresponding to the vertex has data anomaly or the calling relationship corresponding to the edge has data anomaly.
Fig. 6a is a schematic diagram of an update service instance provided in an exemplary embodiment of the present disclosure, and fig. 6b is a schematic diagram of an update call relationship provided in an exemplary embodiment of the present disclosure. In fig. 6a, when updating the vertex feature, the information of its directed neighbors will be passed to it along the edges of the directed graph with a weight. Similarly, edge features can also be updated by transforming the roles of vertices and edges in the directed graph. As shown in fig. 6b, the feature representation of the current modality may be updated and improved by obtaining complementary information from the data of the other modalities.
In some cases, there is some uncertainty in determining the data anomaly detection result using either the prediction error or the reconstruction error alone. For example, the vertex feature and the edge feature of the directed graph cannot be restored, however, the complex relationship between the directed graph and the micro-service system cannot be restored and does not necessarily represent that there is a data anomaly, in other words, the complex relationship cannot be restored and does not necessarily represent that there is no data anomaly, so in the embodiment of the present specification, the reconstruction error and the prediction error are utilized to jointly determine the data anomaly detection result corresponding to the target micro-service system, so that a bidirectional verification basis for data anomaly detection is provided, and the accuracy of data anomaly detection is further improved.
In some embodiments, the time dependent information is obtained through a time attention network of the mask when determining the reconstruction error.
In some embodiments, determining spatial dependency information between a plurality of service instances based on a plurality of directed graphs includes: based on the plurality of directed graphs and the graph meaning network, spatial dependency information between the plurality of service instances is determined.
Fig. 7 is a schematic structural diagram of a timing anomaly detection model according to an exemplary embodiment of the present disclosure. As shown in fig. 7, the timing anomaly detection model is composed of an encoder module and a decoder module.
In order to learn the time dependence of the time point t, it is necessary to perform context window division on multi-source data (including index timing data, log timing data, and call chain timing data), and the specific division operation is as shown in formula (1).
D t ={G t-k+1 ,…,G t }={M t-k+1:t ,L t-k+1:t ,S t-k+1:t } (1)
k is the sliding window size, D from the time dimension t May be represented as a collection of directed graphs. From the data source, D t It may consist of three parts, index timing data, log timing data, and call chain timing data. Thus, d= { D k ,...,D T }. Illustratively, for each sliding window, the label according to the last directed graph is taken as the label of the entire sliding window, i.e., y= { Y k ,…,Y T "where Y t E { -1,0,1}, wherein, -1 represents abnormal data, 0 represents unknown tag data, and 1 represents normal data, so as to train the time sequence abnormal detection model to be trained, and obtain the time sequence abnormal detection model meeting training conditions.
As shown in fig. 7, input encoding (Input encoding) is to perform dimension change on dimension characteristics of multi-source data, so as to facilitate multi-head attention processing, and meanwhile, avoid overlarge dimension of a certain source data. In addition, position codes are added on the time dimension according to the sequence in the sliding window, and the sequence is emphasized. Specifically, the workflow of Input encoding may be defined as the following operations:
x’=xW+b+Positional Encoding(x)
x={m,l,t},raw * 、F * Respectively, the original dimension and the transformed dimension of a certain multi-source data. The present operation aims at correcting the feature dimension and the highlight time dimension, and the number dimension of service instances is unchanged.
To sum up: d'. t ={M' t-k+1:t ,L' t-k+1:t ,S' t-k+1:t }←InputEmbed(D t )
The spatial attention network (Spatial Attention Module, SAM) is to update vertex features of the directed graph and edge features of the directed graph on the directed graph using a messaging mechanism, capturing interactions between different service instances.
Illustratively, the SAM consists of two layers of graph attention networks (Graph Attention Network, GAT), one layer GAT is to perform feature computation between service instances and one layer GAT is to perform adjustment computation between service schedules of service instances. Since attention coefficients can refer to edge features of the directed graph in the calculation of GAT, feature interaction between service instances and service scheduling can be sensitively perceived, and feature alternation among call chain time sequence data, index time sequence data and log time sequence data can be effectively performed, as shown in fig. 6a and 6 b. The workflow of the SAM is shown in equation (2) and equation (3), respectively.
Wherein:
LeakyReLU(x)=max(x,0)+min(x,0)×ξ,ξ=0.01
w is a learnable matrix. W (W) 1 ,W 3 ,/>W 5 ∈R d ,/> W 7 ∈R d’ ,/>SL represents the vertex feature of the directed graph and EF represents Edge Features of the directed graph. Near (i)/(u { i } represents a node adjacent to the i-th node and the i-node itself. V (V) t,i Characteristic of node i at time point t, alpha t,ij And the node space attention coefficients of the node i and the node j at the time point t are represented. E (E) t,ij And the characteristic of the edge between the node i and the node j at the time point t is shown. Beta t,iju And the side space attention coefficient of the side between the node i and the node j and the side between the node j and the node u at the time point t is represented. exp () represents an exponential operation, d, d' being a constant associated with a dimension.
In summary, combining the multi-head mechanism has the following deductions, h p And h e The number of heads representing two operations:
the time attention network (Temporal Attention Module, TAM) is intended to capture the time dependence of time series data, and fig. 8 is a block diagram of the time attention network provided in an exemplary embodiment of the present specification. Since the different data sources (including the index time series data and the log time series data) are all actually describing the service instance, the time dependence of the different data sources should be the same, so that the calculation of the single-mode time attention is guided by the total time attention when calculating the time attention scores related to the different data sources, and can also be regarded as interaction among the multi-source data features. In the embodiment of the present description, the time-attention network of the two parts of the encoder module and the decoder module are different, and in the encoder module, all data can be seen at any time point in the sliding window, and in the decoder module, the sliding window can only see the current time point and the previous information, and the masked time-attention network can achieve the purpose.
The calculation formulas involved in the time attention network are as formula (4) to formula (6):
ATT=mean(att(M t-k+1:t ,M t-k+1:t ),att(L t-k+1:t ,L t-k+1:t ),att(S t-k+1:t ,S t-k+1:t )) (5)
wherein M is t-k+1:t 、L t-k+1:t And S is t-k+1:t Representing a sliding window D t Index time sequence data, log time sequence data and call chain time sequence data in the memory. d is the characteristic size of the inputs Q, K. W (W) Q ,W K ,W V ∈R d×d Is a matrix of learnable parameters. Mask () represents the upper triangle data set 0 for the correlation coefficient matrix, achieving the effect of the Mask in the previous discussion.
In summary, the following deductions are made by combining the multi-head mechanism:
h M 、h L and h S Representing the number of multiple heads for different data sources, respectively.
The cross-attention network (Cross Attention Module) decodes the hidden state Zt with the encoder and assists the decoder in obtaining the steering information from Zt. To achieve this goal, a multi-head attention method may be employed.
MultiHeadAtt(Q,K,V)=Concat(H 1 ,…,H h )
Where Q represents a query vector, K represents a key value vector, V represents a value vector, and the feature size of Q, K, V is mapped to m. Assuming that the encoded information is E and the information in the decoder is D, both the information is composed of index timing data, log timing data, and call chain timing data.
In a cross-attention network:
a feed forward network (Feed Forward Network, FFN) is used to map low-dimensional features to high dimensions, facilitating data anomaly detection model discrimination. The feed-forward network involves a calculation formula such as formula (7).
Wherein,are all a matrix of parameters which can be learned, b 1 ,b 2 Corresponding offsets.In a feed forward network, for multi-source data:
D’ t ←FFN(D t )={FFN(M t-k+1:t ),FFN(L t-k+1:t ),FFN(S t-k+1:t )}
normalized network (Norm):
where γ and β are both learnable parameters, gain and bias for x. Epsilon is a very small constant.
For multi-source data:
D’ t ←Norm(D t )
={LayerNorm(M t-k+1:t ),LayerNorm(L t-k+1:t ),LayerNorm(S t-k+1:t )}
in summary, the data in the data anomaly detection model is deformed into: for the current time t, there is a sliding window D t For the coding partThe decoding part has +>The RightShift () means the direction D t To the left of which is added a 0 message to discard G t Is a piece of information of (a). />
The workflow of the encoder network layers can be defined by:
spatial attention network:
time attention network:
feed-forward layer network:
the workflow of the decoder's network layers can be defined by:
spatial attention network:
masked time attention network:
cross-attention network:
feed-forward layer network:
in the data anomaly detection model, the loss function is designed as follows:
for input sliding window D t Can obtain a reconstruction window O t . Examination paperTaking D by considering time dependence and labels t And O t Last directed graph G t And G' t As an anomaly score for the window, formula (8) is shown. In order to avoid the situation that the judgment of the division threshold value is abnormal, the score is mapped to the two-dimensional vector to be used as the probability score of the window normal and abnormal, as shown in a formula (9).
Anomaly Score(AS)=sum(||G t -G' t || 2 ) (8)
Wherein, ||G t -G' t I=r° R, where R represents a directed graph G, where R represents a dot multiplication operation of elements at positions corresponding to the two matrices t Feature data and directed graph G' t R=concat ((M) t -M' t ),(L t -L’ t ),W is a matrix of learnable parameters for mapping S to the same dimension of M and L.
For directed graphs of known labels, the loss calculation can be performed directly using the classification. For directed graphs of unknown labels, it is also desirable to clearly distinguish by way of reconstruction: it is desirable that the anomaly score of the anomaly directed graph is large, while the score of the normal directed graph is small. Since anomalies are a small probability event, most of the data should be normal, so the score for unknown tags should be as small as possible. To ensure a stable determination of the model for known anomalies, we set parameter variations for both Loss.
Wherein n is a Representing the number of abnormal samples in the known tag data, n n Representing the number of normal samples in the known tag data, n=n a +n n The method comprises the steps of carrying out a first treatment on the surface of the η is a constant that controls learning of unknown tag data; n is the number of known tag data, m is the number of unknown tag data,the weights representing the abnormal data and the normal data relate to the problem of unbalanced number of labels, and θ refers to model parameters.
In some embodiments of the present description, after the directed graph is generated, the multi-source data may be divided not using a sliding window but using one point in time, for example, the characteristics of the directed graph may be updated using a GCN or GAT method, and the characteristic representation of the entire directed graph may be calculated. By means of the Deep SVDD, the characteristic representation of the whole directed graph can be mapped to the ball, and whether data abnormality exists in the directed graph or not can be judged by calculating the distance from the center of the ball.
Next, the performance of the data anomaly detection model was verified by some experimental data.
MSDS is a recent high quality multi-source data, distributed trace, application log, and index composition from complex distributed systems. The data set is specially constructed for artificial intelligence operations including automatic anomaly detection, root cause analysis, and remediation. Experiments were therefore performed using the MSDS dataset.
The ratio of training set to test set was 7:3. Since both the HADES and the methods in this specification are semi-supervised methods, occlusion of part of the data tag is required. The ratio of known tag data to unknown tag data is about 1:1, the data ratio of the normal label to the abnormal label in the known label is about 80:1. Experiments used Precision (PR), recall (RC), area under the subject's operating characteristics (ROC-AUC), average accuracy (AP, average Precision) and F1 scores to evaluate the performance of all models to detect abnormalities.
Table 1 experimental results
TraceAnomaly is an anomaly detection method for PU learning using Trace call time, and uses the reconstruction error of time vectors to determine anomaly scores. Although the traceAnomaly method can achieve a very high recall rate, the traceAnomaly needs to wait for the completion of the whole Trace call, which is not a real-time method. PLELog is a semi-supervised anomaly detection method combining Log semantic information, is influenced by the size of a Log sliding window, the quantity of logs generated by different service instances is different, the quantity of logs in anomaly occurrence and normal condition is huge, and the same Log sliding window is divided into problems, so that performance deviation is caused. TranAD and USAD are anomaly detection methods based on index data, and USAD is an AutoEncoder architecture constructed by utilizing a linear layer, and can only capture certain characteristics of data in a sliding window, so that anomaly information cannot be completely captured. TranAD employs a transducer architecture, embedding position coding, but lacks consideration of interactions between different service instances. SCWarn is a method that uses multiple LSTM integration, uses LSTM capture time dependence on different data sources, uses the reconstructed score distribution of training data to determine test data anomalies, but does not take into account the relationship between data distribution drift and multi-source data. The HADES uses Cross-attribute to effectively combine log and index data, but requires pre-training of the model with existing label data. Pseudo tag labeling of unknown tag data with a pre-training model requires sufficient and complete labeling data to pre-train the model. However, the abnormal data is difficult to determine, and it is difficult to have complete abnormal data.
In the embodiment of the present description, a directed graph is constructed by using the call relationship between SPAN, so that the interaction between vertices can be understood. In addition, the information among the multi-source data can be effectively extracted by utilizing the space attention network and the time attention network, so that the multi-source data fusion is realized, and the abnormality is commonly judged.
Fig. 9 is a schematic flow chart of determining time series data according to an exemplary embodiment of the present disclosure. The embodiment shown in fig. 9 is extended from the embodiment shown in fig. 2, and differences between the embodiment shown in fig. 9 and the embodiment shown in fig. 2 are described with emphasis, and the details of the differences are not repeated.
As shown in fig. 9, in the embodiment of the present specification, the index timing data, the log timing data, and the call chain timing data in the target micro service system are determined, including the following steps.
Step S910, index data, log data and call chain data in the target micro service system are acquired.
And step S920, carrying out structural processing and time serialization processing on the index data, the log data and the call chain data in the target micro-service system to obtain index time sequence data, log time sequence data and call chain time sequence data of the target micro-service system.
In order to solve the unstructured problem and the inconsistent metrology problem among the multi-source data (including index data, log data and call chain data), the formula (10) can be used for preprocessing the multi-source data T to obtain numerical time sequence data.
Wherein the method comprises the steps ofRespectively representing index data, log data, and call chain data.
For index data, the value and the recorded value standard of each index data are different, in order to avoid the degradation of the data anomaly detection performance caused by the overlarge or no change of some index data, the index data need to be normalized and recorded according to the time point.
Index data
M t Index data representing time t, t.epsilon.1, T],Wherein T represents the operation time of the target micro service system, N represents the number of service instances, F m Indicating the number of indicators, R indicates a real number. Index data M for time point t t And (3) performing normalization processing, wherein the specific processing process is shown in a formula (11).
Wherein,and->Respectively represent the maximum value and the minimum value in the index data, and epsilon is a small constant.
For log data, log data is recorded by time point. The log data generally has information such as time points, log record objects, log contents and the like. Because the log data is unstructured information, the log data is difficult to directly use, and the log data needs to be analyzed and converted into structured information. The method is low in example, log analysis is carried out by using a Drain3 algorithm, a log template is obtained, and a log template index is generated for each log data. In order to maintain the same statistical measure as the index data, the log data at each time point can be counted according to the log template, so that the statistics can be expressed L t Log data representing the time t, +.>Wherein F is l Is the number of log templates. Similarly, the log data is normalized, and the specific processing procedure is as shown in formula (12):
wherein,and->Representing the maximum and minimum values in the log data, respectively.
Aiming at the call chain data, in order to ensure the integrity of the SPAN information in the call chain data, statistics is carried out according to the end time of each SPAN to obtain the call chain time sequence data. SPAN can represent due to the property of duration of presenceS t Call chain data representing the time t, < >>Wherein F is s Is the number of call chain request types. When the call chain data is normalized, the SPAN duration is as short as tens of milliseconds, and can be as long as tens of seconds, and information is not transferred among some service instances, so that the data information can be lost by using min-max normalization. To distinguish between minimal and no messaging, approximate normalization is used here to process call chain data, with the specific process being shown in equation (13).
Wherein,representing the average in the call chain data.
In the embodiment of the specification, the index data, the log data and the call chain data are respectively converted into the index time sequence data, the log time sequence data and the call chain time sequence data, so that the abnormal situation of the service instance of the same time node is conveniently determined. In addition, the index data, the log data and the call chain data are normalized, so that the record value standard and the measurement and balance of the three data can be further unified, and the data anomaly detection performance is improved. The log data is processed by using the log template, and some irrelevant useless data can be filtered out to obtain log data which is more useful for detecting the data abnormality.
The method embodiment of the present application is described above in detail with reference to fig. 2 to 9, and the apparatus embodiment of the present application is described below in detail with reference to fig. 10. It is to be understood that the description of the method embodiments corresponds to the description of the device embodiments, and that parts not described in detail can therefore be seen in the preceding method embodiments.
Fig. 10 is a schematic diagram showing a structure of a data anomaly detection apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 10, the data anomaly detection apparatus 100 provided in the embodiment of the present specification includes:
a first determining module 1010, configured to determine index timing data, log timing data, and call chain timing data in a target micro-service system, where the target micro-service system includes a plurality of micro-services;
a second determining module 1020, configured to determine a plurality of service instances corresponding to the plurality of micro services based on the index timing data and the log timing data;
a third determining module 1030, configured to determine a service call relationship between a plurality of service instances based on the call chain timing data;
a generating module 1040, configured to generate a plurality of directed graphs having a timing relationship based on a plurality of service instances and service call relationships among the plurality of service instances;
A fourth determining module 1050, configured to determine a data anomaly detection result corresponding to the target micro service system based on the multiple directed graphs.
In some embodiments of the present disclosure, the fourth determining module 1050 is further configured to determine, based on the plurality of directed graphs, spatial dependency information between the plurality of service instances; determining time dependency information between the indicator timing data, the log timing data, and the call chain timing data based on the plurality of directed graphs; and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs, the space dependence information and the time dependence information.
In some embodiments of the present disclosure, the fourth determining module 1050 is further configured to update the plurality of directed graphs based on the spatial dependency information and the time dependency information, to obtain the feature data corresponding to the plurality of service instances and the prediction error of the feature data corresponding to the service call relationship; based on the space dependence information and the time dependence information, updating part of the directed graphs in the plurality of directed graphs to obtain feature data corresponding to a plurality of service instances and reconstruction errors of the feature data corresponding to the service calling relations; and determining a data abnormality detection result corresponding to the target micro-service system based on the reconstruction error and the prediction error.
In some embodiments of the present description, the time dependent information is obtained through a time attention network of the mask when determining the reconstruction error.
In some embodiments of the present description, the fourth determination module 1050 is further configured to determine spatial dependency information between the plurality of service instances based on the plurality of directed graphs and the graph intent network.
In some embodiments of the present disclosure, the first determining module 1010 is further configured to obtain index data, log data, and call chain data in the target micro-service system; and carrying out structural processing and time serialization processing on the index data, the log data and the call chain data in the target micro-service system to obtain index time sequence data, log time sequence data and call chain time sequence data of the target micro-service system.
In some embodiments of the present description, vertex features of the directed graph are determined based on the metric timing data and the log timing data, and edge features of the directed graph are determined based on the call chain timing data.
An electronic device according to one or more embodiments of the present specification is described below with reference to fig. 11. Fig. 11 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
As shown in fig. 11, the electronic device 110 includes one or more processors 1101 and a memory 1102.
The processor 1101 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 110 to perform desired functions.
Memory 1102 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 1101 to implement the methods of the various embodiments of the present description and/or other desired functions as described above. Various contents such as including index timing data, log timing data, call chain timing data, a plurality of directed graphs, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 110 may further include: an input device 1103 and an output device 1104, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 1103 may include, for example, a keyboard, a mouse, and the like.
The output device 1104 can output various information to the outside, including index timing data, log timing data, call chain timing data, a plurality of directed graphs, and the like. The output device 1104 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, for simplicity, only some of the components of the electronic device 110 that are relevant to one or more embodiments of the present description are shown in fig. 11, with components such as buses, input/output interfaces, etc. omitted. In addition, the electronic device 110 may include any other suitable components depending on the particular application.
In addition to the methods and apparatus described above, embodiments of the present description may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the methods according to the various embodiments of the present description described above.
The computer program product may write program code for performing the operations of embodiments of the present description in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present description may also be a computer-readable storage medium, on which computer program instructions are stored, which, when being executed by a processor, cause the processor to perform the steps in the method according to various embodiments of the present description described above.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of one or more embodiments of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in one or more embodiments of the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the description is not intended to be limited to the specific details set forth.
The block diagrams of devices, apparatus, devices, systems, and methods, which are referred to in one or more embodiments of the present description, are presented as illustrative examples only and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It should also be noted that in the apparatus, devices and methods of the present description, the components or steps may be separated and/or recombined. Such decomposition and/or recombination should be considered as equivalents of the present description.
The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use one or more embodiments of the present description. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the description is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the specification to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. A data anomaly detection method, comprising:
determining index time sequence data, log time sequence data and call chain time sequence data in a target micro-service system, wherein the target micro-service system comprises a plurality of micro-services;
Determining a plurality of service instances corresponding to the plurality of micro services based on the index time sequence data and the log time sequence data;
determining a service call relationship between the plurality of service instances based on the call chain timing data;
generating a plurality of directed graphs with time sequence relations based on the plurality of service instances and service calling relations among the plurality of service instances;
and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs.
2. The method of claim 1, wherein determining, based on the plurality of directed graphs, a data anomaly detection result corresponding to the target micro-service system, comprises:
determining spatial dependency information between the plurality of service instances based on the plurality of directed graphs;
determining time dependent information between the indicator timing data, the log timing data, and the call chain timing data based on the plurality of directed graphs;
and determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs, the space dependence information and the time dependence information.
3. The method of claim 2, wherein determining the data anomaly detection result corresponding to the target micro-service system based on the plurality of directed graphs, the spatial dependency information, and the temporal dependency information comprises:
Updating the plurality of directed graphs based on the space dependence information and the time dependence information to obtain the feature data corresponding to the plurality of service instances and the prediction error of the feature data corresponding to the service calling relationship;
based on the space dependence information and the time dependence information, updating part of the directed graphs in the plurality of directed graphs to obtain feature data corresponding to the plurality of service instances and reconstruction errors of the feature data corresponding to the service calling relationship;
and determining a data abnormality detection result corresponding to the target micro-service system based on the reconstruction error and the prediction error.
4. A method according to claim 3, wherein the time dependent information is obtained via a time attention network of a mask when determining the reconstruction error.
5. The method of claim 2, the determining spatial dependency information between the plurality of service instances based on the plurality of directed graphs, comprising:
based on the plurality of directed graphs and the graph meaning network, spatial dependency information between the plurality of service instances is determined.
6. The method of any of claims 1 to 5, the determining the indicator timing data, the log timing data, and the call chain timing data in the target micro-service system, comprising:
Acquiring index data, log data and call chain data in the target micro-service system;
and respectively carrying out structural processing and time-series processing on the index data, the log data and the call chain data in the target micro-service system to obtain index time sequence data, log time sequence data and call chain time sequence data in the target micro-service system.
7. The method of any of claims 1-5, vertex characteristics of the directed graph determined based on the metric timing data and the log timing data, edge characteristics of the directed graph determined based on the call chain timing data.
8. A data anomaly detection device, comprising:
the first determining module is used for determining index time sequence data, log time sequence data and call chain time sequence data in a target micro-service system, wherein the target micro-service system comprises a plurality of micro-services;
the second determining module is used for determining a plurality of service instances corresponding to the plurality of micro services based on the index time sequence data and the log time sequence data;
a third determining module, configured to determine a service call relationship between the plurality of service instances based on the call chain timing data;
The generation module is used for generating a plurality of directed graphs with time sequence relations based on the service instances and the service calling relations among the service instances;
and the fourth determining module is used for determining a data abnormality detection result corresponding to the target micro-service system based on the plurality of directed graphs.
9. A computer readable storage medium storing a computer program for executing the method of any one of the preceding claims 1 to 7.
10. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor being adapted to perform the method of any of the preceding claims 1 to 7.
CN202311031839.7A 2023-08-16 2023-08-16 Data anomaly detection method and device, storage medium and electronic equipment Pending CN117056166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311031839.7A CN117056166A (en) 2023-08-16 2023-08-16 Data anomaly detection method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311031839.7A CN117056166A (en) 2023-08-16 2023-08-16 Data anomaly detection method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117056166A true CN117056166A (en) 2023-11-14

Family

ID=88665831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311031839.7A Pending CN117056166A (en) 2023-08-16 2023-08-16 Data anomaly detection method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117056166A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117850403A (en) * 2024-03-06 2024-04-09 西安晟昕科技股份有限公司 Dynamic detection method for initiating and controlling system
CN117909910A (en) * 2024-03-19 2024-04-19 成都工业学院 Automatic detection method for system exception log based on graph attention network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117850403A (en) * 2024-03-06 2024-04-09 西安晟昕科技股份有限公司 Dynamic detection method for initiating and controlling system
CN117850403B (en) * 2024-03-06 2024-05-17 西安晟昕科技股份有限公司 Dynamic detection method for initiating and controlling system
CN117909910A (en) * 2024-03-19 2024-04-19 成都工业学院 Automatic detection method for system exception log based on graph attention network

Similar Documents

Publication Publication Date Title
US10977293B2 (en) Technology incident management platform
US20170109657A1 (en) Machine Learning-Based Model for Identifying Executions of a Business Process
US20210042590A1 (en) Machine learning system using a stochastic process and method
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
CN117056166A (en) Data anomaly detection method and device, storage medium and electronic equipment
US10365945B2 (en) Clustering based process deviation detection
US20170109636A1 (en) Crowd-Based Model for Identifying Executions of a Business Process
US11307916B2 (en) Method and device for determining an estimated time before a technical incident in a computing infrastructure from values of performance indicators
US20170109639A1 (en) General Model for Linking Between Nonconsecutively Performed Steps in Business Processes
US20170109638A1 (en) Ensemble-Based Identification of Executions of a Business Process
CN110490304B (en) Data processing method and device
CN116759053A (en) Medical system prevention and control method and system based on Internet of things system
CN117236677A (en) RPA process mining method and device based on event extraction
Stødle et al. Data‐driven predictive modeling in risk assessment: Challenges and directions for proper uncertainty representation
Moreno et al. Managing uncertain complex events in web of things applications
CN117316462A (en) Medical data management method
US20170109637A1 (en) Crowd-Based Model for Identifying Nonconsecutive Executions of a Business Process
US20170109670A1 (en) Crowd-Based Patterns for Identifying Executions of Business Processes
CN110209589B (en) Knowledge base system test method, device, equipment and medium
Dreves et al. Validating Data and Models in Continuous ML Pipelines.
CN116863116A (en) Image recognition method, device, equipment and medium based on artificial intelligence
CN112328899B (en) Information processing method, information processing apparatus, storage medium, and electronic device
Jesmeen et al. AUTO-CDD: automatic cleaning dirty data using machine learning techniques
CN114547231A (en) Data tracing method and system
CN117150439B (en) Automobile manufacturing parameter detection method and system based on multi-source heterogeneous data fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination