CN116107847A - Multi-element time series data anomaly detection method, device, equipment and storage medium - Google Patents

Multi-element time series data anomaly detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116107847A
CN116107847A CN202310392748.XA CN202310392748A CN116107847A CN 116107847 A CN116107847 A CN 116107847A CN 202310392748 A CN202310392748 A CN 202310392748A CN 116107847 A CN116107847 A CN 116107847A
Authority
CN
China
Prior art keywords
data
variable
variables
anomaly detection
causal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310392748.XA
Other languages
Chinese (zh)
Other versions
CN116107847B (en
Inventor
吴颖楠
王磊
肖雁飞
李娜
张孝枫
王媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310392748.XA priority Critical patent/CN116107847B/en
Publication of CN116107847A publication Critical patent/CN116107847A/en
Application granted granted Critical
Publication of CN116107847B publication Critical patent/CN116107847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Debugging And Monitoring (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application relates to the technical field of data analysis, and particularly discloses a method, a device, equipment and a storage medium for detecting abnormal data of a multi-element time sequence by utilizing a deep learning algorithm. The method comprises the following steps: acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprises at least one variable; based on the anomaly detection model, determining causal relationships among variables, and obtaining at least one independent variable and at least one causal relationship group; and the anomaly detection model respectively carries out anomaly detection on the target variable sequence and the independent variable sequence in the causal relationship group to obtain the data anomaly probability. According to the method, the variables of the operation and maintenance data are divided into causal relation groups and independent variables according to the causal relation among the variables, the causal relation among the variables is combined in the abnormal detection of the data, the variables without the causal relation are independently analyzed, and only one variable in the causal relation groups is required to be subjected to abnormal detection, so that the workload is reduced, and the abnormal detection efficiency is improved.

Description

Multi-element time series data anomaly detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data analysis technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting anomalies in multiple time series data.
Background
The effective abnormality detection data method is widely applied to various fields in the real world, such as quantitative transaction (credit card theft and extra-large amount expenditure), user login, network security detection, automatic driving of automobiles, daily maintenance of large industrial equipment and the like, and an abnormality detection algorithm has important significance for the application and service. However, in the conventional anomaly detection model, for example, DNN (Deep Neural Network ) neural network model, the causal relationship among variables is not considered when the anomaly detection is performed, all the variable sequences in the time series data need to be detected one by one, the detection amount is large, and thus the efficiency of data detection is reduced, so how to improve the efficiency of data detection becomes a problem to be solved urgently.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for detecting multiple time series data anomalies so as to improve the efficiency of anomaly detection.
In a first aspect, the present application provides a method for detecting anomalies in multivariate time series data, the method comprising:
acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprises at least one variable;
based on an anomaly detection model, determining causal relationships among variables in the operation and maintenance data, and obtaining at least one independent variable and at least one causal relationship group;
the anomaly detection model respectively carries out anomaly detection on a target variable sequence in the causal relationship group and an independent variable sequence corresponding to the independent variable to obtain the data anomaly probability of the operation and maintenance data so as to be used for operation and maintenance personnel to carry out data maintenance.
In a second aspect, the present application further provides a multivariate time series data anomaly detection device, the device comprising:
the data acquisition module is used for acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprise at least one variable;
the variable group obtaining module is used for determining the causal relation among the variables in the operation and maintenance data based on an anomaly detection model and obtaining at least one independent variable and at least one causal relation group;
the anomaly detection module is used for respectively carrying out anomaly detection on the target variable sequence in the causal relationship group and the independent variable sequence corresponding to the independent variable by the anomaly detection model to obtain the data anomaly probability of the operation and maintenance data so as to be used for the operation and maintenance personnel to carry out data maintenance.
In a third aspect, the present application also provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the multivariate time series data anomaly detection method described above when the computer program is executed.
In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement a multivariate time series data anomaly detection method as described above.
The application discloses a method, a device, equipment and a storage medium for detecting multi-element time series data abnormality, wherein operation and maintenance data of at least one moment are acquired, and the operation and maintenance data comprise at least one variable; based on an anomaly detection model, determining causal relationships among variables in the operation and maintenance data, and obtaining at least one independent variable and at least one causal relationship group; the anomaly detection model respectively carries out anomaly detection on a target variable sequence in the causal relationship group and an independent variable sequence corresponding to the independent variable to obtain the data anomaly probability of the operation and maintenance data so as to be used for operation and maintenance personnel to carry out data maintenance. According to the method, the variables of the operation and maintenance data are divided into causal relation groups and independent variables according to the causal relation among the variables, the causal relation among the variables is combined in the abnormal detection of the data, the variables without the causal relation are independently analyzed, and only one variable in the causal relation groups is required to be subjected to abnormal detection, so that the detection workload is reduced, and the abnormal detection efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a first embodiment of a multivariate time series data anomaly detection method provided by embodiments of the present application;
FIG. 2 is a schematic flow chart diagram of a second embodiment of a multivariate time series data anomaly detection method provided by embodiments of the present application;
FIG. 3 is a schematic diagram of causal calculation relationship between variables in a method for anomaly detection of multivariate time series data according to an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of a third embodiment of a multivariate time series data anomaly detection method provided by embodiments of the present application;
FIG. 5 is a schematic block diagram of a multivariate time series data anomaly detection device provided by an embodiment of the present application;
fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
The embodiment of the application provides a method, a device, equipment and a storage medium for detecting multi-element time series data anomalies. The multivariate time series data anomaly detection method can be applied to a server, and by anomaly detection of one variable in a causal relation group, detection workload is reduced, and anomaly detection efficiency is improved. The server may be an independent server or a server cluster.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic flowchart of a method for detecting anomalies in multivariate time series data according to an embodiment of the present application. The multivariate time series data anomaly detection method can be applied to a server and is used for reducing detection workload and improving anomaly detection efficiency by carrying out anomaly detection on one variable in a causal relationship group.
As shown in fig. 1, the multivariate time series data anomaly detection method specifically includes steps S101 to S103.
S101, acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprises at least one variable.
In one embodiment, data is obtained that requires anomaly detection, such as server operation and maintenance data, that includes at least one time of day data, at least one variable, for example, there are X, Y, Z, D, M five variables.
In one embodiment, the acquired operation and maintenance data is a plurality of time series data, each variable has corresponding data at each time instant, and the operation and maintenance data of at least one time instant is acquired, so that a time series including at least one data, for example, { X } = { X (1), X (2), X (3)..x (n) } n.gtoreq.1, can be obtained.
S102, based on an anomaly detection model, determining causal relations among variables in the operation and maintenance data, and obtaining at least one independent variable and at least one causal relation group.
In one embodiment, the anomaly detection model includes a causal relationship determination module and an anomaly detection module. And according to the causal relation determining module, causal relation calculation is carried out on each variable in the operation and maintenance data, and the causal relation of each variable is determined.
In one embodiment, two variables and time sequences corresponding to the variables are arbitrarily extracted from the operation and maintenance data, the time sequences are reconstructed, correlation coefficients between the reconstructed time sequences are calculated, and the obtained correlation coefficients are used as the correlation coefficients between the two extracted variables.
In one embodiment, the correlation coefficient value is an indicator that characterizes the strength of causality, with larger values indicating a stronger causal effect of one variable on another variable. Thus, the correlation coefficient of one of the two variables extracted is calculated for the other variable, for example, X, Y, Z, D, M variables are included in the operation and maintenance data, the correlation coefficient between the two variables is calculated by arbitrarily extracting the two variables, for example, extracting X and Y, and then the correlation coefficient beta of X to Y is calculated x,y And Y is toCorrelation coefficient beta of X y,x Comparison of beta x,y And beta y,x The size of (3) can obtain the directed causal relationship group.
In one embodiment, at β x,y Beta ratio of y,x When the value is large, if the directed causal relation group (X, Y) is determined to indicate that the causal effect of X on Y is strong, the variable sequence corresponding to the variable X is taken as the variable sequence which needs to be subjected to abnormality detection in the directed causal relation group (X, Y), and the variable sequence is represented as beta y,x Beta ratio of x,y When the result is large, the directed causal relation group is determined to be (Y, X), namely the causal effect of Y on X is strong, and the variable sequence corresponding to the variable Y is taken as a target variable sequence to be detected in (Y, X).
Based on the anomaly detection model, determining the causal relationship between variables in the operation and maintenance data, and before obtaining at least one independent variable and at least one causal relationship group, further comprising: acquiring historical operation and maintenance data and an abnormal record of the historical data; detecting the historical operation data based on a pre-training model to obtain data anomaly probability corresponding to the historical operation data; determining the detection accuracy of abnormality detection of the pre-training model based on the historical data abnormality record and the data abnormality probability; and when the detection accuracy is greater than a preset accuracy threshold, taking the pre-training model as the abnormal detection model.
In one embodiment, the historical operation and maintenance data are processed and calculated through the pre-training model, the abnormal probability corresponding to the historical operation and maintenance data is obtained, the obtained abnormal probability is compared with the real historical data abnormal record, and then the detection accuracy of the pre-training model is obtained.
In one embodiment, when the detection accuracy is greater than a preset accuracy threshold, taking the currently trained pre-training model as an anomaly detection model; and when the detection accuracy is smaller than a preset accuracy threshold, extracting data of detection errors according to the comparison result, and re-detecting the extracted data by using the optimized model until the detection accuracy meets the requirement.
S103, the anomaly detection model respectively detects anomalies of the target variable sequence in the causal relationship group and the independent variable sequence corresponding to the independent variable to obtain the data anomaly probability of the operation and maintenance data so as to enable operation and maintenance personnel to carry out data maintenance.
In one embodiment, anomaly detection is performed on a sequence of target variables and a sequence of independent variables in the causal group using an anomaly detection model, the sequence of target variables being determined based on the strength of causal effects between variables in the causal group.
In a specific embodiment, the anomaly detection model calculates the correlation distance data between the data in the variable sequence, for example, (X, Y) is a set of causal relationship groups, and if the causal effect of X on Y is strong, the anomaly detection is performed on the variable sequence { X } = { X (1), X (2), X (3), …, X (n+1) }. Definition: d, d (x(i),x(j)) =d x(i) -d x(j) ,d (x(i),x(j)) X (i) and X (j) are the distances between X (i) and X (j), and X (i) and X (j) are some two data of X variables respectively. d_k x(i) =d x(i) -d x(ik) ,d_k x(i) The kth distance from the point X (i), X (ik), is the point distant from the kth distance from the point X (i). reach dist k(x(i),x(j)) =max{d_k x(i) ,d (x(i),x(j)) },reachdist k(x(i),x(j)) The reachable distance is the kth distance of the point X (i) and the maximum value from the point X (i) to the point X (j). Finally, the step of obtaining the product,
Figure SMS_1
therein lrd k (X (i)) is the local reachable density, N k X (i) is the reachable distance from the neighborhood point of X (i) to X (i). According to this definition, the LOF factor obtained for the X variable is LOF x Then
Figure SMS_2
In one implementation, the outlier likelihood of the remaining variables within the causal relationship group is defined based on an anomaly detection result, LOF (Local Outlier Factor, local anomaly factor) factor score. For example, assume that the LOF factor sequence for the variable X is LOF x, Lof is further processed by the third step x Sequence normalization, normalized lof x ' sequence is defined asIs the data outlier likelihood value for the remaining variable Y within the causal relationship group (X, Y).
In one embodiment, when the anomaly probability value exceeds a preset probability threshold, the data is indicated to be anomalous, and the data is recalled.
Referring to fig. 2, fig. 2 is a schematic flowchart of a method for detecting anomalies in multi-component time-series data according to an embodiment of the present application. The multivariate time series data anomaly detection method can be applied to a server and is used for reducing detection workload and improving anomaly detection efficiency by carrying out anomaly detection on one variable in a causal relationship group.
As shown in fig. 2, the multivariate time series data anomaly detection method specifically includes steps S201 to S203.
S201, judging whether causal relation exists among variables of the operation and maintenance data based on the causal relation determining module;
s202, if the causal relationship exists among the variables, the variable with the causal relationship is used as a causal relationship group;
s203, if the causal relationship does not exist among the variables, the variables are used as independent variables.
Based on the causal relationship determining module, determining whether a causal relationship exists between variables of the operation and maintenance data includes: based on the causal relationship determining module, arbitrarily extracting two variables from the operation and maintenance data to serve as a first variable and a second variable, and acquiring a first time sequence corresponding to the first variable and a second time sequence corresponding to the second variable; reconstructing the first time sequence and the second time sequence based on a target embedding dimension number, respectively, to generate a first reconstructed time sequence and a second reconstructed time sequence; and calculating a correlation coefficient between the first reconstruction time sequence and the second reconstruction time sequence, and determining whether a causal relationship exists between the first variable and the second variable based on a preset correlation coefficient threshold and the correlation coefficient.
If the causal relationship exists among the variables, the causal relationship-exists variables are used as a causal relationship group, and then the method further comprises the following steps: the sequence of target variables is determined based on the causal strength of the variable within the causal group against another variable.
In one embodiment, causal correlations between variables are calculated to determine causal relationships for each variable in the subject system. For example, there are time series { X }, { Y }, { Z }, { D }, { M }, where time series { X } = { X (1), X (2), X (3), …, X (n+1) } and { Y } = { Y (1), Y (2), Y (3), …, Y (n+1) } are exemplified.
In one embodiment, zero-mean real Gaussian white noise is injected into the two sequences to enhance the strength and robustness of variable matching correlation, and M is reconstructed X And M Y The following are provided:
Figure SMS_3
Figure SMS_4
where τ=1 is the time lag and E is the embedding dimension, which is a super parameter, the value of E can be determined using a genetic algorithm with a minimum mean square error as the objective function. T=e+2, e+3,..n, n+1. In deep learning, real Gaussian white noise is added into data, so that robustness and generalization capability of a model can be improved. By adding noise to the input data, the model is forced to learn features that are robust to small changes in the input, so that the model performs better on new, invisible data, and reconstructing the time series is beneficial to mining the correlation of the entire time series.
In one embodiment, M is calculated using Euclidean distance X And M Y Finding the nearest E+1 neighbor points from the distances from each point to other points, and constructing a weight W based on the distances i Re-weighted average neighbor point value as estimated value of actual values X (t) and Y (t)
Figure SMS_5
And->
Figure SMS_6
. Weight W i The structural formula of (2) is->
Figure SMS_7
,d[X(s),X(t)]Is the Euclidean distance between X(s) and X (t).
In one embodiment, a correlation coefficient β of the estimated value and the actual value is calculated x,y 。β x,y The value is an index representing the causal relation strength, beta x,y The larger value indicates a stronger causal effect of the X variable on the Y variable. Two by two calculations, as indicated by the arrows in FIG. 3, obtain (β x,yy,x ), (β x,zz,x ), (β x,dd,x ), (β x,mm,x ), (β y,zz,y ), (β y,dd,y ), (β y,mm,y ), (β z,dd,z ), (β z,mm,z ), (β d,mm,d )。
If the result (. Beta.) is calculated according to the above method x,yy,x ) The set of directed causal relationships (variable 1, variable 2) is defined as:
Figure SMS_8
and judging the causal relation among all the variables in the operation and maintenance data according to the algorithm and the rule, dividing the variables into at least one causal relation group, and taking the variables without causal relation as independent variables.
In one embodiment, based on the above calculations, β may be obtained x,y Beta y,x If the value of beta is determined x,y Greater than beta y,x And determining that the directed relation group is (X, Y) to indicate that the causal effect of the X variable on the Y variable is stronger than the causal effect intensity of the Y variable on the X variable, and taking the X variable as a target variable sequence for abnormality detection.
Referring to fig. 4, fig. 4 is a schematic flowchart of a method for detecting anomalies in multi-component time-series data according to an embodiment of the present application. The multivariate time series data anomaly detection method can be applied to a server and is used for reducing detection workload and improving anomaly detection efficiency by carrying out anomaly detection on one variable in a causal relationship group.
As shown in fig. 4, the multivariate time series data anomaly detection method specifically includes steps S301 to S302.
S301, based on the abnormality detection module, performing abnormality detection on the independent variable sequence and the target variable sequence to obtain local abnormality factors;
s302, normalizing the local abnormal factor to obtain a standard value of the local abnormal factor, and taking the standard value as the data abnormal probability of the operation and maintenance data.
Based on the abnormality detection module, performing abnormality detection on the independent variable sequence and the target variable sequence to obtain local abnormality factors, including: based on the abnormality detection module, obtaining related distance data between data in the variable sequence; and obtaining local abnormality factors of the abnormality of the data of the variable sequence based on the related distance data.
In one embodiment, to avoid the impact of the multiple factor differences in complex system variables on the results, the intra-set variables are normalized by a very bad method when the variables within the causal set are detected. Taking X column data as an example, X i ’=(X i -X min )/(X max -X min ) Wherein X is i Is the ith data in X columns of data, X min Is the smallest value in X columns of data, X max Is the maximum value in the X columns of data.
In one embodiment, the formula d is passed (x(i),x(j)) =d x(i) -d x(j) The distance between X (i) and X (j) is obtained. And then according to the formula d_k x(i) =d x(i) -d x(ik) The kth distance of the point X (i) is obtained. According to the formula reach dist k(x(i),x(j)) =max{d_k x(i) ,d (x(i),x(j)) -calculating the reach dis k(x(i),x(j)) I.e. the kth distance of point X (i) and the distance of point X (i) to pointMaximum value of X (j). Finally, according to the formula
Figure SMS_9
Calculating to obtain local reachable density lrd k (X(i)),N k X (i) is the reachable distance from the neighborhood point of X (i) to X (i). According to the formula
Figure SMS_10
The LOF factor was LOF x I.e. local anomaly factors.
In one embodiment, the anomaly factor sequence is normalized by a range normalization, the normalized sequence being defined as the data anomaly likelihood value for the remaining variable Y within the causal relationship group (X, Y).
Referring to fig. 5, fig. 5 is a schematic block diagram of a multivariate time series data anomaly detection device according to an embodiment of the present application, wherein the multivariate time series data anomaly detection device is configured to perform the multivariate time series data anomaly detection method. The multivariate time series data anomaly detection device can be configured on a server.
As shown in fig. 5, the multivariate time series data anomaly detection device 400 includes:
a data acquisition module 401, configured to acquire operation and maintenance data at least one moment, where the operation and maintenance data includes at least one variable;
a variable group obtaining module 402, configured to determine a causal relationship between variables in the operation and maintenance data based on an anomaly detection model, and obtain at least one independent variable and at least one causal relationship group;
the anomaly detection module 403 is configured to perform anomaly detection on the target variable sequence in the causal relationship group and the independent variable sequence corresponding to the independent variable respectively by using the anomaly detection model, so as to obtain a data anomaly probability of the operation and maintenance data, so that an operation and maintenance person performs data maintenance.
Further, the variable group obtaining module 402 includes:
the causal relation judging unit is used for judging whether causal relation exists among the variables of the operation and maintenance data based on the causal relation determining module;
the causal relation group determining module is used for taking the variable with causal relation as a causal relation group if the causal relation exists among the variables;
and the independence variable determining module is used for taking the variables as independence variables if the causal relationship does not exist among the variables.
Further, the causal relationship judging unit includes:
the variable extraction subunit is used for arbitrarily extracting two variables from the operation and maintenance data as a first variable and a second variable based on the causal relationship determination module, and acquiring a first time sequence corresponding to the first variable and a second time sequence corresponding to the second variable;
a time sequence reconstruction subunit, configured to reconstruct the first time sequence and the second time sequence based on a target embedding dimension number, respectively, to generate a first reconstructed time sequence and a second reconstructed time sequence;
and the correlation coefficient calculating subunit is used for calculating the correlation coefficient between the first reconstruction time sequence and the second reconstruction time sequence and determining whether the first variable and the second variable have causal relation or not based on a preset correlation coefficient threshold value and the correlation coefficient.
Further, the variable group obtaining module 402 further includes:
and the target variable sequence determining unit is used for determining the target variable sequence based on the causal relationship strength of the variable in the causal relationship group to another variable.
Further, the abnormality detection module 403 includes:
the local abnormality factor obtaining unit is used for carrying out abnormality detection on the independent variable sequence and the target variable sequence based on the abnormality detection module to obtain a local abnormality factor;
the data anomaly probability obtaining unit is used for normalizing the local anomaly factors to obtain standard values of the local anomaly factors, and taking the standard values as the data anomaly probabilities of the operation and maintenance data.
Further, the local abnormality factor obtaining unit includes:
a related distance data obtaining subunit, configured to obtain related distance data between data in the variable sequence based on the anomaly detection module;
and the local abnormality factor obtaining subunit is used for obtaining the local abnormality factor of the abnormality of the data of the variable sequence based on the related distance data.
Further, the multivariate time series data anomaly detection device 400 further includes a detection model obtaining module, where the detection model obtaining module includes:
the historical data acquisition unit is used for acquiring historical operation and maintenance data and historical data abnormal records;
the historical data anomaly probability obtaining unit is used for detecting the historical operation data based on a pre-training model to obtain data anomaly probability corresponding to the historical operation data;
the detection accuracy obtaining unit is used for determining the detection accuracy of the pre-training model for abnormality detection based on the historical data abnormal record and the data abnormal probability;
and the anomaly detection model determining unit is used for taking the pre-training model as the anomaly detection model when the detection accuracy is greater than a preset accuracy threshold.
It should be noted that, for convenience and brevity of description, the specific working process of the apparatus and each module described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The apparatus described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.
With reference to FIG. 6, the computer device includes a processor, memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory.
The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any one of a plurality of methods of detecting anomalies in time series data.
The processor is used to provide computing and control capabilities to support the operation of the entire computer device.
The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of methods for detecting anomalies in time-series data.
The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:
acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprises at least one variable;
based on an anomaly detection model, determining causal relationships among variables in the operation and maintenance data, and obtaining at least one independent variable and at least one causal relationship group;
the anomaly detection model respectively carries out anomaly detection on a target variable sequence in the causal relationship group and an independent variable sequence corresponding to the independent variable to obtain the data anomaly probability of the operation and maintenance data so as to be used for operation and maintenance personnel to carry out data maintenance.
In one embodiment, the processor is configured to, when implementing the anomaly detection model, determine a causal relationship between variables in the operation and maintenance data based on the anomaly detection model, obtain at least one independent variable and at least one causal relationship group, implement:
judging whether causal relation exists among variables of the operation and maintenance data based on the causal relation determining module;
if the causal relationship exists among the variables, the variable with the causal relationship is used as a causal relationship group;
and if the causal relationship does not exist among the variables, the variables are used as independent variables.
In one embodiment, the processor is configured to, when implementing the determining whether there is a causal relationship between the variables of the operation and maintenance data based on the causal relationship determining module, implement:
based on the causal relationship determining module, arbitrarily extracting two variables from the operation and maintenance data to serve as a first variable and a second variable, and acquiring a first time sequence corresponding to the first variable and a second time sequence corresponding to the second variable;
reconstructing the first time sequence and the second time sequence based on a target embedding dimension number, respectively, to generate a first reconstructed time sequence and a second reconstructed time sequence;
and calculating a correlation coefficient between the first reconstruction time sequence and the second reconstruction time sequence, and determining whether a causal relationship exists between the first variable and the second variable based on a preset correlation coefficient threshold and the correlation coefficient.
In one embodiment, the processor is further configured to, after implementing that the causal variable is a causal group if the causal relationship exists between the variables, implement:
the sequence of target variables is determined based on the causal strength of the variable within the causal group against another variable.
In one embodiment, when the processor performs anomaly detection on the target variable sequence in the causal relationship group and the independent variable sequence corresponding to the independent variable by implementing the anomaly detection model, the processor obtains a data anomaly probability of the operation and maintenance data, so that the processor is used for implementing data maintenance by operation and maintenance personnel:
based on the abnormality detection module, performing abnormality detection on the independent variable sequence and the target variable sequence to obtain local abnormality factors;
and normalizing the local abnormal factor to obtain a standard value of the local abnormal factor, and taking the standard value as the data abnormal probability of the operation and maintenance data.
In one embodiment, the processor is configured to, when implementing abnormality detection on the independent variable sequence and the target variable sequence based on the abnormality detection module, obtain a local abnormality factor, implement:
based on the abnormality detection module, obtaining related distance data between data in the variable sequence;
and obtaining local abnormality factors of the abnormality of the data of the variable sequence based on the related distance data.
In one embodiment, before implementing the determining of the causal relationship between the variables in the operation and maintenance data based on the anomaly detection model, the processor is further configured to, before obtaining at least one independent variable and at least one causal relationship group, implement:
acquiring historical operation and maintenance data and an abnormal record of the historical data;
detecting the historical operation data based on a pre-training model to obtain data anomaly probability corresponding to the historical operation data;
determining the detection accuracy of abnormality detection of the pre-training model based on the historical data abnormality record and the data abnormality probability;
and when the detection accuracy is greater than a preset accuracy threshold, taking the pre-training model as the abnormal detection model.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and the processor executes the program instructions to realize any multivariate time series data anomaly detection method provided by the embodiment of the application.
The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for detecting anomalies in multivariate time series data, comprising:
acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprises at least one variable;
based on an anomaly detection model, determining causal relationships among variables in the operation and maintenance data, and obtaining at least one independent variable and at least one causal relationship group;
the anomaly detection model respectively carries out anomaly detection on a target variable sequence in the causal relationship group and an independent variable sequence corresponding to the independent variable to obtain the data anomaly probability of the operation and maintenance data so as to be used for operation and maintenance personnel to carry out data maintenance.
2. The method of claim 1, wherein the anomaly detection model includes a causal relationship determination module, and wherein the determining the causal relationship between the variables in the operation and maintenance data based on the anomaly detection model, to obtain at least one independent variable and at least one causal relationship group, comprises:
judging whether causal relation exists among variables of the operation and maintenance data based on the causal relation determining module;
if the causal relationship exists among the variables, the variable with the causal relationship is used as a causal relationship group;
and if the causal relationship does not exist among the variables, the variables are used as independent variables.
3. The method for detecting anomalies in multiple time series data according to claim 2, wherein the determining whether there is a causal relationship between the variables of the operation and maintenance data based on the causal relationship determination module includes:
based on the causal relationship determining module, arbitrarily extracting two variables from the operation and maintenance data to serve as a first variable and a second variable, and acquiring a first time sequence corresponding to the first variable and a second time sequence corresponding to the second variable;
reconstructing the first time sequence and the second time sequence based on a target embedding dimension number, respectively, to generate a first reconstructed time sequence and a second reconstructed time sequence;
and calculating a correlation coefficient between the first reconstruction time sequence and the second reconstruction time sequence, and determining whether a causal relationship exists between the first variable and the second variable based on a preset correlation coefficient threshold and the correlation coefficient.
4. The method according to claim 2, wherein if the causal relationship exists between the variables, the method further comprises, after taking the causal-related variables as a causal relationship group:
the sequence of target variables is determined based on the causal strength of the variable within the causal group against another variable.
5. The method for anomaly detection of multiple time series data according to claim 1, wherein the anomaly detection model includes an anomaly detection module, and the anomaly detection model performs anomaly detection on a target variable sequence in the causal relationship group and an independent variable sequence corresponding to the independent variable respectively, obtains a data anomaly probability of the operation and maintenance data, for operation and maintenance personnel to perform data maintenance, and includes:
based on the abnormality detection module, performing abnormality detection on the independent variable sequence and the target variable sequence to obtain local abnormality factors;
and normalizing the local abnormal factor to obtain a standard value of the local abnormal factor, and taking the standard value as the data abnormal probability of the operation and maintenance data.
6. The method for anomaly detection of multiple time series data according to claim 5, wherein the anomaly detection of the independent variable sequence and the target variable sequence based on the anomaly detection module, to obtain local anomaly factors, comprises:
based on the abnormality detection module, obtaining related distance data between data in a variable sequence;
and obtaining local abnormality factors of the abnormality of the data of the variable sequence based on the related distance data.
7. The method for anomaly detection of multivariate time series data according to any one of claims 1 to 6, wherein prior to determining causal relationships between variables in the operational dimensional data based on an anomaly detection model to obtain at least one independent variable and at least one causal relationship group, further comprising:
acquiring historical operation and maintenance data and an abnormal record of the historical data;
detecting the historical operation data based on a pre-training model to obtain data anomaly probability corresponding to the historical operation data;
determining the detection accuracy of abnormality detection of the pre-training model based on the historical data abnormality record and the data abnormality probability;
and when the detection accuracy is greater than a preset accuracy threshold, taking the pre-training model as the abnormal detection model.
8. A multivariate time series data anomaly detection device comprising:
the data acquisition module is used for acquiring operation and maintenance data of at least one moment, wherein the operation and maintenance data comprise at least one variable;
the variable group obtaining module is used for determining the causal relation among the variables in the operation and maintenance data based on an anomaly detection model and obtaining at least one independent variable and at least one causal relation group;
the anomaly detection module is used for respectively carrying out anomaly detection on the target variable sequence in the causal relationship group and the independent variable sequence corresponding to the independent variable by the anomaly detection model to obtain the data anomaly probability of the operation and maintenance data so as to be used for the operation and maintenance personnel to carry out data maintenance.
9. A computer device, the computer device comprising a memory and a processor;
the memory is used for storing a computer program;
the processor configured to execute the computer program and implement the multivariate time series data anomaly detection method according to any one of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the multivariate time series data anomaly detection method of any one of claims 1 to 7.
CN202310392748.XA 2023-04-13 2023-04-13 Multi-element time series data anomaly detection method, device, equipment and storage medium Active CN116107847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310392748.XA CN116107847B (en) 2023-04-13 2023-04-13 Multi-element time series data anomaly detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310392748.XA CN116107847B (en) 2023-04-13 2023-04-13 Multi-element time series data anomaly detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116107847A true CN116107847A (en) 2023-05-12
CN116107847B CN116107847B (en) 2023-06-27

Family

ID=86264189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310392748.XA Active CN116107847B (en) 2023-04-13 2023-04-13 Multi-element time series data anomaly detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116107847B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820539A (en) * 2023-08-30 2023-09-29 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080219544A1 (en) * 2007-03-09 2008-09-11 Omron Corporation Factor estimating support device and method of controlling the same, and factor estimating support program
US20110276828A1 (en) * 2009-01-14 2011-11-10 Kenji Tamaki Apparatus anomaly monitoring method and system
CN107703920A (en) * 2017-10-25 2018-02-16 北京交通大学 The fault detection method of train braking system based on multivariate time series
WO2020046260A1 (en) * 2018-08-27 2020-03-05 Siemens Aktiengesellschaft Process semantic based causal mapping for security monitoring and assessment of control networks
JP2020181443A (en) * 2019-04-26 2020-11-05 株式会社豊田中央研究所 Abnormality detection apparatus, abnormality detection method, and computer program
CN112416662A (en) * 2020-11-26 2021-02-26 清华大学 Multi-time series data anomaly detection method and device
US20210182358A1 (en) * 2019-12-11 2021-06-17 International Business Machines Corporation Root cause analysis using granger causality
CN113344093A (en) * 2021-06-21 2021-09-03 成都民航空管科技发展有限公司 Multi-source ADS-B data abnormal time scale detection method and system
US20230093540A1 (en) * 2021-09-22 2023-03-23 The Toronto-Dominion Bank System and Method for Detecting Anomalous Activity Based on a Data Distribution

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080219544A1 (en) * 2007-03-09 2008-09-11 Omron Corporation Factor estimating support device and method of controlling the same, and factor estimating support program
US20110276828A1 (en) * 2009-01-14 2011-11-10 Kenji Tamaki Apparatus anomaly monitoring method and system
CN107703920A (en) * 2017-10-25 2018-02-16 北京交通大学 The fault detection method of train braking system based on multivariate time series
WO2020046260A1 (en) * 2018-08-27 2020-03-05 Siemens Aktiengesellschaft Process semantic based causal mapping for security monitoring and assessment of control networks
JP2020181443A (en) * 2019-04-26 2020-11-05 株式会社豊田中央研究所 Abnormality detection apparatus, abnormality detection method, and computer program
US20210182358A1 (en) * 2019-12-11 2021-06-17 International Business Machines Corporation Root cause analysis using granger causality
CN112416662A (en) * 2020-11-26 2021-02-26 清华大学 Multi-time series data anomaly detection method and device
CN113344093A (en) * 2021-06-21 2021-09-03 成都民航空管科技发展有限公司 Multi-source ADS-B data abnormal time scale detection method and system
US20230093540A1 (en) * 2021-09-22 2023-03-23 The Toronto-Dominion Bank System and Method for Detecting Anomalous Activity Based on a Data Distribution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KUKJIN CHOI ET AL.: "deep learning for anomaly detection on time-series data: review, analysis, and guidelines", 《IEEE ACCESS》, vol. 9, pages 120043 - 120065, XP011876629, DOI: 10.1109/ACCESS.2021.3107975 *
WENZHOU YANG ET AL.: "a causal approach to detecting multivariate time-series anomalies and root causes", 《ARXIC》, pages 1 - 19 *
秦凯: "基于时间序列的工业过程异常检测与根因分析", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》, no. 1, pages 028 - 627 *
缪丹丹: "基于多维关联的移动网络状态分析研究", 《中国博士学位论文全文数据库信息科技辑》, no. 9, pages 136 - 47 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116820539A (en) * 2023-08-30 2023-09-29 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet
CN116820539B (en) * 2023-08-30 2023-11-10 深圳市秦丝科技有限公司 System software operation maintenance system and method based on Internet

Also Published As

Publication number Publication date
CN116107847B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
JP7223839B2 (en) Computer-implemented methods, computer program products and systems for anomaly detection and/or predictive maintenance
AU2019201857B2 (en) Sparse neural network based anomaly detection in multi-dimensional time series
CN112131272B (en) Method, device, equipment and storage medium for detecting multi-element KPI time sequence
CN114297936A (en) Data anomaly detection method and device
CN111144548B (en) Method and device for identifying working condition of oil pumping well
CN116107847B (en) Multi-element time series data anomaly detection method, device, equipment and storage medium
US20220245405A1 (en) Deterioration suppression program, deterioration suppression method, and non-transitory computer-readable storage medium
CN114037478A (en) Advertisement abnormal flow detection method and system, electronic equipment and readable storage medium
CN111814910A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
CN116304909A (en) Abnormality detection model training method, fault scene positioning method and device
CN111177224A (en) Time sequence unsupervised anomaly detection method based on conditional regularized flow model
Xu et al. Industrial process fault detection and diagnosis framework based on enhanced supervised kernel entropy component analysis
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
Stief et al. Fault diagnosis using interpolated kernel density estimate
JP2019105871A (en) Abnormality candidate extraction program, abnormality candidate extraction method and abnormality candidate extraction apparatus
Kim et al. An adaptive step-down procedure for fault variable identification
Yang et al. An incipient fault diagnosis methodology using local Mahalanobis distance: Fault isolation and fault severity estimation
CN113487223A (en) Risk assessment method and risk assessment system based on information fusion
CN113554128A (en) Unconventional anomaly detection method and system and storage medium
CN112131274A (en) Method, device and equipment for detecting time series abnormal points and readable storage medium
CN112463564A (en) Method and device for determining correlation index influencing host state
CN116662904A (en) Method, device, computer equipment and medium for detecting variation of data type
CN114401205B (en) Method and device for detecting drift of unmarked multi-source network flow data
CN113010571A (en) Data detection method, data detection device, electronic equipment, storage medium and program product
CN115982224A (en) Providing interpretability for multi-variable time series data abnormity detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant