CN118114180A - Time series data abnormity detection method, device, equipment, storage medium and program product - Google Patents

Time series data abnormity detection method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN118114180A
CN118114180A CN202410329000.XA CN202410329000A CN118114180A CN 118114180 A CN118114180 A CN 118114180A CN 202410329000 A CN202410329000 A CN 202410329000A CN 118114180 A CN118114180 A CN 118114180A
Authority
CN
China
Prior art keywords
data sequence
data
sequence
result
trend
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410329000.XA
Other languages
Chinese (zh)
Inventor
徐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202410329000.XA priority Critical patent/CN118114180A/en
Publication of CN118114180A publication Critical patent/CN118114180A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present disclosure provides a method, apparatus, device, storage medium, and program product for detecting anomalies in time series data, which can be applied to the technical field of artificial intelligence and the technical field of financial science and technology. The method comprises the following steps: acquiring a first data sequence of an N-th time period and a second data sequence of an M-th time period, wherein the N-th time period is later than the M-th time period; obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence; according to the time information of the N time period and the time information of the M time period, calculating to obtain a second data sequence trend result, wherein the second data sequence trend result represents the data change trend of the second data sequence; obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result; and under the condition that the detection result of the first data sequence represents abnormality, displaying the detection result on the interactive interface.

Description

Time series data abnormity detection method, device, equipment, storage medium and program product
Technical Field
The present disclosure relates to the field of artificial intelligence and financial technology, and in particular, to a method, apparatus, device, storage medium, and program product for detecting anomalies in time series data.
Background
With the development of financial technology, abnormal detection is performed on financial related time sequence data, so that risk events are identified, precaution actions can be taken in time, and loss is avoided. Time series anomaly detection refers to a method of finding data points or data sets in time series data that do not conform to an expected pattern. Anomalies may be due to system faults, noise, artifacts, etc., and may also be the result of normal changes. For example, the device operation data, transaction data, flow data of an application program, etc. generated when a financial institution handles a service may be abnormal at a time point or for a period of time. When the time sequence data is monitored to be abnormal, relevant operation and maintenance personnel can be timely reminded of paying attention to the abnormal data.
The time series abnormality detection is carried out by means of a statistical method, a machine learning method and the like, but the time series abnormality detection has low accuracy due to large data volume and high calculation complexity of time series data.
Disclosure of Invention
In view of the above, the present disclosure provides a time series data anomaly detection method, apparatus, device, storage medium, and program product.
According to a first aspect of the present disclosure, there is provided a time series data anomaly detection method including: and acquiring a first data sequence of an nth time period and a second data sequence of an mth time period, wherein N, M are positive numbers larger than zero, the nth time period is later than the mth time period, and the first data sequence and the second data sequence are formed by a plurality of service data arranged based on a time sequence relation. And obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence. And calculating a second data sequence trend result according to the time information of the N-th time period and the time information of the M-th time period, wherein the second data sequence trend result represents the data change trend of the second data sequence. And obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result, and displaying the detection result on the interactive interface when the detection result of the first data sequence is abnormal.
According to an embodiment of the present disclosure, the obtaining the waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence includes: and calculating the average sequence similarity according to the similarity between the first data sequence and the second data sequence and the data quantity of the first data sequence. And obtaining the waveform similarity result between the first data sequence and the second data sequence according to the sequence average similarity.
According to an embodiment of the present disclosure, the calculating the second data sequence trend result according to the time information of the nth time period and the time information of the mth time period includes: and calculating a difference sequence between the first data sequence and the second data sequence according to the time information of the N-th time period and the time information of the M-th time period. And detecting the data change trend of the difference sequence based on a trend detection algorithm to obtain a trend result of the second data sequence.
According to an embodiment of the present disclosure, the detecting a trend of the data of the difference sequence based on the trend detection algorithm, and obtaining the trend result of the second data sequence includes: and detecting the data change trend of the difference sequence based on the detection algorithm to obtain a difference trend result. And under the condition that the difference trend result represents that the difference sequence has an upward trend, obtaining the trend result of the second data sequence.
According to an embodiment of the present disclosure, when the difference trend result indicates that the difference sequence has an increasing trend, obtaining the second data sequence trend result includes: and under the condition that the difference trend result represents that the difference sequence has an upward trend, determining a difference subsequence with the upward trend from the difference sequence. And fitting the difference subsequence based on a least square method algorithm to obtain the fitted difference subsequence. And obtaining the trend result of the second data sequence according to the rising basic point and the rising absolute value of the fitting difference subsequence.
According to an embodiment of the present disclosure, the obtaining the trend result of the second data sequence according to the rising base point and the rising absolute value of the fitting difference subsequence includes: and calculating an ascending ratio according to the ascending base point and the ascending absolute value of the fitting difference subsequence. And under the condition that the rising ratio meets the rising ratio threshold, obtaining a trend result of the second data sequence based on the difference data in the difference sequence and the difference sequence mean value, wherein the difference sequence mean value represents the mean value of the difference data in the difference sequence.
According to an embodiment of the present disclosure, the above method further includes: and acquiring a third data sequence of a Kth time period, wherein K is a positive number larger than zero, and the Mth time period is later than the Kth time period. And inputting the third data sequence into a regression model to obtain a third data sequence trend result.
According to an embodiment of the present disclosure, inputting the third data sequence into a regression model to obtain a third data sequence trend result includes: and inputting the third data sequence into a regression model to obtain a fitting slope. And obtaining the third data sequence trend result based on the fitting slope.
According to an embodiment of the present disclosure, the obtaining the detection result of the first data according to the waveform similarity result and the second data sequence trend result further includes: and obtaining a detection result of the first data according to the waveform similarity result, the second data sequence trend result and the third data sequence trend result.
According to an embodiment of the present disclosure, the first data in the first data sequence or the second data in the second data sequence includes at least one of the following: transaction data, device operation data, network traffic data.
A second aspect of the present disclosure provides a time series data anomaly detection apparatus, including: the first acquiring module is configured to acquire a first data sequence of an nth time period and a second data sequence of an mth time period, wherein N, M is a positive number greater than zero, and the nth time period is later than the mth time period, and the first data sequence and the second data sequence are both configured by a plurality of data arranged based on a time sequence relationship. And the first obtaining module is used for obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data quantity of the first data sequence. And the calculation module is used for calculating and obtaining a second data sequence trend result according to the time information of the N-th time period and the time information of the M-th time period, and the second data sequence trend result represents the data change trend of the second data sequence. And the second obtaining module is used for obtaining the detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result. And the display module is used for displaying the detection result on the interactive interface under the condition that the detection result of the first data sequence represents abnormality.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.
A fourth aspect of the present disclosure also provides a computer readable storage medium having stored thereon an executable computer program which when executed by a processor performs the steps of the above method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
According to the time series data abnormality detection method, the time series data abnormality detection device, the time series data abnormality detection equipment, the storage medium and the program product, the first data sequence of the N time period and the second data sequence of the M time period are obtained. According to the similarity between the first data sequence and the second data sequence and the data quantity of the first data sequence, a waveform similarity result between the first data sequence and the second data sequence is obtained, and the waveform similarity result reflects the waveform similarity result of the whole data sequence in consideration of the influence of the data quantity on time sequence data so as to reduce the complexity of subsequent calculation; calculating to obtain a trend result of the second data sequence according to the time information of the N-th time period and the time information of the M-th time period; and obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result, and comprehensively considering waveform similarity and sequence trend change between time sequences, thereby improving the accuracy of time sequence data anomaly detection.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a time series data anomaly detection method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of time series data anomaly detection in accordance with an embodiment of the present disclosure;
Fig. 3 schematically illustrates a flowchart of calculating a second data sequence trend result according to time information of an nth time period and time information of an mth time period according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a method of time series data anomaly detection in accordance with another embodiment of the present disclosure;
Fig. 5 schematically shows a block diagram of a time-series data abnormality detection apparatus according to an embodiment of the present disclosure; and
Fig. 6 schematically illustrates a block diagram of an electronic device adapted to implement a method of time-series data anomaly detection, according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical solution of the present disclosure, the related user information (including, but not limited to, user personal information, user image information, user equipment information, such as location information, etc.) and data (including, but not limited to, data for analysis, stored data, displayed data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the related data is collected, stored, used, processed, transmitted, provided, disclosed, applied, etc. in compliance with relevant laws and regulations and standards, necessary security measures are taken, no prejudice to the public order colloquia is provided, and corresponding operation entries are provided for the user to select authorization or rejection.
In the scenario of using personal information to make an automated decision, the method, the device and the system provided by the embodiment of the disclosure provide corresponding operation inlets for users, so that the users can choose to agree or reject the automated decision result; if the user selects refusal, the expert decision flow is entered. The expression "automated decision" here refers to an activity of automatically analyzing, assessing the behavioral habits, hobbies or economic, health, credit status of an individual, etc. by means of a computer program, and making a decision. The expression "expert decision" here refers to an activity of making a decision by a person who is specializing in a certain field of work, has specialized experience, knowledge and skills and reaches a certain level of expertise.
The embodiment of the disclosure provides a time sequence data anomaly detection method, which comprises the following steps: acquiring a first data sequence of an N-th time period and a second data sequence of an M-th time period, wherein N, M is a positive number larger than zero, and the N-th time period is later than the M-th time period; obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence; according to the time information of the N time period and the time information of the M time period, calculating to obtain a second data sequence trend result, wherein the second data sequence trend result represents the data change trend of the second data sequence; and obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result.
Fig. 1 schematically illustrates an application scenario diagram of a time series data anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the method for detecting a time series data anomaly provided by the embodiment of the disclosure may be generally performed by the server 105. Accordingly, the time series data abnormality detection apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The method for detecting the anomaly of the time series data provided by the embodiment of the present disclosure may also be performed by a server or a server cluster which is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105. Accordingly, the apparatus for detecting a time series data abnormality provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flowchart of a time series data anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 2, the time series data abnormality detection method of this embodiment includes operations S210 to S240.
In operation S210, the first data sequence of the nth time period and the second data sequence of the mth time period, N, M, which are both positive numbers greater than zero, are acquired, the nth time period being later than the mth time period.
In operation S220, a waveform similarity result between the first data sequence and the second data sequence is obtained according to the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence.
In operation S230, a second data sequence trend result is calculated according to the time information of the nth time period and the time information of the mth time period.
In operation S240, a detection result of the first data sequence is obtained according to the waveform similarity result and the second data sequence trend result.
In operation S250, in the case that the detection result of the first data sequence is abnormal, the detection result is displayed on the interactive interface.
According to an embodiment of the present disclosure, each of the first data sequence and the second data sequence is configured of a plurality of service data arranged based on a time-series relationship. The first data sequence and the second data sequence respectively characterize data generated by transacting business in an nth time period and an mth time period, such as business transaction data, energy consumption data and the like.
For example, the first data sequence of the nth time period may be transaction data before one day and the second data sequence of the mth time period may be transaction data before two days. The nth time period and the mth time period are not limited to days, and may be in hours, minutes, months, or the like. For example, the nth time period may be 10 months of the last year, the mth time period may be 6 months of the last year, etc. For example, the first data sequence of the nth time period may be energy consumption data generated by a business transaction before one month, and the second data sequence of the mth time period may be energy consumption data generated by a business transaction before two months. The energy consumption data may be the electrical energy of the computer.
According to an embodiment of the present disclosure, the data amount of the first data sequence may be a data length of the first data sequence, i.e. the amount of data contained in the first data sequence.
According to embodiments of the present disclosure, the similarity between the first data sequence and the second data sequence may be derived based on euclidean distance, pearson correlation coefficient, fast dynamic time warping (FAST DYNAMIC TIME WARPING, fastdtw) algorithm, and the like.
According to an embodiment of the present disclosure, the similarity between the first data sequence and the second data sequence characterizes the similarity between the data in the first data sequence and the data in the second data sequence.
According to an embodiment of the present disclosure, the frequency of acquisition of the data in the first data sequence is fixed and the number of acquisitions is fixed for a fixed period of time. The data amount of the first data sequence and the time information of the nth time period may have an association relationship.
The waveform similarity result between the first data sequence and the second data sequence characterizes the similarity between a curve waveform formed by the time information of the first data sequence and the N-th time period and a curve waveform formed by the time information of the second data sequence and the M-th time period. The time information of the nth time period may be generally taken as abscissa data, and the data in the first data sequence may be taken as ordinate data, so as to form a curve capable of reflecting the time-dependent change of the data in the first data sequence. The curve to which the time information of the mth period is formed is also plotted. The waveform similarity result between the first data sequence and the second data sequence can be obtained based on the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence.
According to the embodiment of the disclosure, according to the time information of the nth time period and the time information of the mth time period, data in the first data sequence and the second data sequence can be processed, and the trend result of the second data sequence is obtained through analysis.
According to an embodiment of the present disclosure, the second data sequence trend result characterizes a data change trend of the second data sequence.
According to embodiments of the present disclosure, the detection result may characterize whether an anomaly exists in the first data sequence. The anomaly forms may be represented as point anomalies, context anomalies, and aggregate anomalies. Point anomalies are typically outliers, or outliers. A context anomaly is typically some anomaly-free number of the point, but an anomaly is presented in the context of the point. Aggregate anomalies are typically single-point no anomalies, but the subset presents anomalies relative to the corpus.
For example, the first data sequence may be transaction data generated by a financial institution's on-site transaction business. The point anomaly may be that the transaction data exceeds the upper transaction limit, the context anomaly may be that the transaction data occurs at 0:00, the aggregate anomaly may be that the total transaction amount for monday reaches 50% of the total transaction amount for one week, etc. According to the embodiment of the disclosure, the interactive interface can be used for an operator to intuitively obtain the change of the first data sequence or the difference between the first data sequence and the second data sequence. Under the condition that the first data sequence is abnormal, the abnormal condition can be timely displayed to operation and maintenance personnel on the interactive interface. The operation and maintenance personnel can operate on the interactive interface to obtain more detailed data information, so that the operation and maintenance personnel can conveniently take targeted risk prevention actions.
According to an embodiment of the present disclosure, a first data sequence of an nth time period and a second data sequence of an mth time period are acquired. According to the similarity between the first data sequence and the second data sequence and the data quantity of the first data sequence, a waveform similarity result between the first data sequence and the second data sequence is obtained, and the waveform similarity result reflects the waveform similarity result of the whole data sequence in consideration of the influence of the data quantity on time sequence data so as to reduce the complexity of subsequent calculation; calculating to obtain a trend result of the second data sequence according to the time information of the N-th time period and the time information of the M-th time period; and obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result, and comprehensively considering waveform similarity and sequence trend change between time sequences, thereby improving the accuracy of time sequence data anomaly detection.
According to an embodiment of the present disclosure, the first data in the first data sequence, or the second data in the second data sequence, comprises at least one of: transaction data, device operation data, network traffic data.
For example, where the first data and the second data are business-generated device operational data, device operation often has periodic time series data, and it is important to determine the similarity between the first device operational data sequence and the second device operational data sequence. By analyzing the similarity between the first equipment operation data sequence and the second equipment operation data sequence, whether the first equipment operation data sequence has the periodic characteristic of equipment operation or not can be accurately obtained.
And obtaining a waveform similarity result between the first equipment operation data sequence and the second equipment operation data sequence according to the similarity between the first equipment operation data sequence and the second equipment operation data sequence and the data volume of the first equipment operation data sequence. The waveform similarity result reflects the waveform similarity result of the overall data sequence in consideration of the influence of the data amount on the time sequence data, so that the complexity of subsequent calculation is reduced.
And then, according to the waveform similarity result and the trend result of the second equipment operation data sequence, a detection result of the first equipment operation data sequence is obtained, so that the detection of abnormal equipment operation data can be rapidly realized, and the continuity of equipment operation is improved.
For example, in the case that the first data and the second data are network traffic data generated by transacting a service, the waveform similarity result between the first network traffic data sequence and the second network traffic data sequence can be obtained according to the similarity between the first network traffic data sequence and the second network traffic data sequence and the data volume of the first network traffic data sequence, and the overall waveform difference between the first network traffic data sequence and the historical second network traffic data sequence is reflected by the similarity result.
And obtaining a detection result of the first network flow data sequence according to the waveform similarity result and the trend result of the second network flow data sequence. Under the condition that the detection result of the first network traffic data sequence represents abnormality, abnormal network traffic data can be displayed on an interactive interface, so that network operation and maintenance personnel can take network stabilizing measures in time. Therefore, the network stability can be effectively ensured by the abnormal detection of the network traffic time sequence data.
According to an embodiment of the present disclosure, obtaining a waveform similarity result between a first data sequence and a second data sequence according to a similarity between the first data sequence and the second data sequence and a data amount of the first data sequence includes: and calculating the average similarity of the sequences according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence. And obtaining a waveform similarity result between the first data sequence and the second data sequence according to the sequence average similarity.
According to an embodiment of the present disclosure, the similarity between the first data sequence and the second data sequence may be a distance value obtained by inputting fastdtw the first data sequence and the second data sequence into an algorithm.
According to the embodiment of the disclosure, due to the large and nonlinear time series data volume, the similarity between the first data sequence and the second data sequence can be divided by the data volume of the first data sequence to obtain the sequence average similarity.
According to an embodiment of the present disclosure, in a case where the similarity between the first data sequence and the second data sequence is greater than a similarity threshold, determining that the waveform similarity between the first data sequence and the second data sequence results in the waveform similarity between the first data sequence and the second data sequence.
According to an embodiment of the present disclosure, in a case where the similarity between the first data sequence and the second data sequence is smaller than a similarity threshold, determining that the waveform similarity between the first data sequence and the second data sequence results in waveform dissimilarity between the first data sequence and the second data sequence.
According to the embodiment of the disclosure, the sequence average similarity is calculated according to the similarity between the first data sequence and the second data sequence and the data quantity of the first data sequence. And then, according to the average similarity of the sequences, obtaining a waveform similarity result between the first data sequence and the second data sequence, and improving the accuracy of the waveform similarity result.
According to an embodiment of the present disclosure, calculating a second data sequence trend result according to time information of an nth time period and time information of an mth time period includes: and calculating a difference sequence between the first data sequence and the second data sequence according to the time information of the N-th time period and the time information of the M-th time period. And detecting the data change trend of the difference sequence based on a trend detection algorithm to obtain a trend result of the second data sequence.
According to the embodiment of the disclosure, according to the time information of the nth time period and the time information of the mth time period, the data in the first data sequence and the data in the second data sequence can be aligned, so that the two sets of data are in one-to-one correspondence, and the first alignment sequence and the second alignment sequence are obtained. For example, the first data of the nth time period is aligned with the first data of the mth time period, and so on, until the last data of the nth time period is aligned with the last data of the mth time period.
When there is a difference in the data amounts of the nth and mth periods, the mth period may be interpolated or sifted such that the data amounts of the nth and mth periods are the same.
According to an embodiment of the disclosure, the data in the first alignment sequence and the data in the second alignment sequence are differenced to obtain a sequence of differences between the first data sequence and the second data sequence.
According to the embodiment of the disclosure, the difference sequence between the first data sequence and the second data sequence is calculated according to the time information of the N-th time period and the time information of the M-th time period, so that the calculation complexity of obtaining the trend result of the second data sequence is reduced, and the performance of a computer is improved.
According to an embodiment of the present disclosure, detecting a trend of data of a difference sequence based on a trend detection algorithm, obtaining a trend result of a second data sequence includes: and detecting the data change trend of the difference sequence based on a detection algorithm to obtain a difference trend result. And under the condition that the difference trend result represents that the difference sequence has an ascending trend, obtaining a second data sequence trend result.
According to embodiments of the present disclosure, the trend detection algorithm may be a Mann-Kendall test algorithm, a slope method, a Cox-Stuart test, or the like.
According to an embodiment of the present disclosure, in a case where the difference trend result characterizes that the difference sequence has an increasing trend, obtaining the second data sequence trend result includes: and under the condition that the difference trend results represent that the difference sequence has an ascending trend, determining a difference subsequence with the ascending trend from the difference sequence. And fitting the difference subsequence based on a least square method algorithm to obtain a fitted difference subsequence. And obtaining a trend result of the second data sequence according to the rising basic point and the rising absolute value of the fitting difference subsequence.
According to the embodiment of the disclosure, under the condition that the difference trend result represents that the difference sequence has an ascending trend, according to the time information of the difference sequence, data adjacent in time are differenced to obtain an ascending starting point p in the difference sequence. And determining a difference subsequence with an upward trend from the difference subsequence according to the starting point p of the upward trend.
According to an embodiment of the disclosure, the second data sequence trend result is determined to be an upward trend of the second data sequence in case the difference trend result characterizes that the difference sequence does not have an upward trend.
According to an embodiment of the disclosure, the difference subsequence is fitted based on a least squares algorithm, resulting in a fitted difference subsequence. The rising starting point p corresponds to the rising basic point of the fitting difference subsequence one by one. And determining an ascending base point by time information of the ascending start point p.
According to an embodiment of the present disclosure, the absolute value of the rise may be the bump data fitting the extremum data in the difference subsequence, i.e. the difference subsequence of the tendency to rise.
According to an embodiment of the present disclosure, obtaining a second data sequence trend result from fitting the rising base point and the rising absolute value of the difference subsequence includes: and calculating to obtain the rising ratio according to the rising basic point and the rising absolute value of the fitting difference subsequence. And under the condition that the rising ratio meets the rising ratio threshold, obtaining a trend result of the second data sequence based on the difference data in the difference sequence and the difference sequence mean value, wherein the difference sequence mean value represents the mean value of the difference data in the difference sequence.
According to an embodiment of the present disclosure, the rising base point may be represented by xbase, the rising absolute value may be x, the rising ratio may be represented by rreal, and the rising ratio formula may be represented by formula (1).
rreal=x/xbase (1)
According to an embodiment of the present disclosure, in the case where the rise ratio is not greater than the rise ratio threshold, the second data sequence trend result is determined to be a rise trend.
According to an embodiment of the present disclosure, in the case where the rise ratio is greater than the rise ratio threshold, a second data sequence trend result is obtained based on the difference data in the difference sequence and the difference sequence mean.
According to the embodiment of the disclosure, difference data in the difference sequence and a difference sequence mean value can be subjected to difference comparison through a T test algorithm, so that a mean value difference sequence result is obtained. And under the condition that the mean value difference sequence result represents that the difference sequence has an ascending trend, determining that the second data sequence trend result is that the second data sequence is the ascending trend. And under the condition that the mean value difference sequence result represents that the difference sequence does not have an ascending trend, determining that the second data sequence trend result is not the ascending trend.
Fig. 3 schematically illustrates a flowchart of calculating a second data sequence trend result according to time information of an nth time period and time information of an mth time period according to an embodiment of the present disclosure.
As shown in fig. 3, the calculation of the second data sequence trend result according to the time information of the nth time period and the time information of the mth time period of this embodiment includes operations S301 to S311.
In operation S301, a difference sequence between the first data sequence and the second data sequence is calculated according to the time information of the nth time period and the time information of the mth time period.
In operation S302, a data trend of the difference sequence is detected based on a detection algorithm, and a difference trend result is obtained.
In operation S303, is the difference trend result characteristic that the difference sequence has an upward trend? If yes, executing operation S304; if not, operation S310 is performed.
In operation S304, a difference subsequence having an upward trend is determined from the difference sequence.
In operation S305, the difference subsequence is fitted based on a least squares algorithm, resulting in a fitted difference subsequence.
In operation S306, an up ratio is calculated from the up base point and the up absolute value of the fitting difference subsequence.
Is the rise rate satisfying the rise rate threshold? If yes, executing operation S308; if not, operation S310 is performed.
In operation S308, difference data in the difference sequence and the average value of the difference sequence may be compared by a T-test algorithm to obtain an average value difference sequence result.
In operation S309, is the mean difference sequence result characteristic that there is an upward trend in the difference sequence? If yes, executing operation S310; if not, operation S311 is performed.
In operation S310, it is determined that the second data sequence trend results in the second data sequence being an upward trend.
In operation S311, it is determined that the second data sequence trend results in the second data sequence not being an upward trend.
According to an embodiment of the present disclosure, the above method further includes: and acquiring a third data sequence of a Kth time period, wherein K is a positive number larger than zero, and the Mth time period is later than the Kth time period. And inputting the third data sequence into a regression model to obtain a third data sequence trend result.
For example, the first data sequence of the nth time period may be transaction data before one day and the second data sequence of the mth time period may be transaction data before two days. The third data sequence of the kth period may be transaction data seven days ago.
According to embodiments of the present disclosure, the regression model may be a decision tree regression model, a polynomial regression model, a random forest regression model, a linear regression model, or the like.
According to an embodiment of the present disclosure, inputting the third data sequence into the regression model, obtaining the third data sequence trend result includes: and inputting the third data sequence into a regression model to obtain a fitting slope. And obtaining a third data sequence trend result based on the fit slope.
According to an embodiment of the present disclosure, obtaining the third data sequence trend result based on the fitted slope may be determining that the third data sequence trend result characterizes the third data sequence as an upward trend if the fitted slope is greater than the slope threshold; determining that the third data sequence trend result characterizes the third data sequence as a non-rising trend if the fitted slope is not greater than the slope threshold
According to an embodiment of the present disclosure, obtaining the detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result further includes: and obtaining a detection result of the first data according to the waveform similarity result, the second data sequence trend result and the third data sequence trend result.
For example, the waveform similarity result represents that waveforms between the first data sequence and the second data sequence are dissimilar, the second data sequence is an ascending trend, and the third data sequence is an ascending trend, so that the first data sequence is judged to be abnormal, an alarm is needed, and the operation and maintenance personnel are reminded to pay attention.
The waveform similarity result represents waveform dissimilarity between the first data sequence and the second data sequence, the second data sequence is an upward trend, the third data sequence is a non-upward trend, and the first data sequence is judged to be normal without warning.
The waveform similarity result represents waveform dissimilarity between the first data sequence and the second data sequence, the second data sequence is in a non-rising trend, the third data sequence is in a rising trend, and the first data sequence is judged to be abnormal, so that an alarm is needed to remind an operation and maintenance person of paying attention.
The waveform similarity result represents that waveforms between the first data sequence and the second data sequence are dissimilar, the second data sequence is in a non-rising trend, the third data sequence is in a non-rising trend, and the first data sequence is judged to be normal without warning.
The waveform similarity result represents waveform similarity between the first data sequence and the second data sequence, the second data sequence is an upward trend, and the third data sequence is an upward trend, so that the first data sequence is judged to be abnormal, an alarm is needed, and the operation and maintenance personnel are reminded of paying attention.
The waveform similarity result represents waveform similarity between the first data sequence and the second data sequence, the second data sequence is in an ascending trend, the third data sequence is in a non-ascending trend, and the first data sequence is judged to be normal without warning.
The waveform similarity result represents waveform similarity between the first data sequence and the second data sequence, the second data sequence is in a non-rising trend, the third data sequence is in a non-rising trend, and the first data sequence is judged to be normal without warning.
The waveform similarity result represents waveform similarity between the first data sequence and the second data sequence, the second data sequence is in a non-rising trend, and the third data sequence is in a rising trend, so that the first data sequence is judged to be abnormal, an alarm is needed, and the attention of operation and maintenance personnel is reminded.
According to the embodiment of the disclosure, the detection result of the first data is comprehensively judged according to the waveform similarity result, the second data sequence trend result and the third data sequence trend result, so that the detection accuracy is improved.
Fig. 4 schematically illustrates a flowchart of a method of time-series data anomaly detection according to another embodiment of the present disclosure.
As shown in fig. 4, the time series data abnormality detection method of this embodiment includes operations S401 to S407.
In operation S401, the first data sequence of the nth time period and the second data sequence of the mth time period are acquired, N, M are both positive numbers greater than zero, and the nth time period is later than the mth time period.
In operation S402, a waveform similarity result between the first data sequence and the second data sequence is obtained according to the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence.
In operation S403, a second data sequence trend result is calculated according to the time information of the nth time period and the time information of the mth time period.
In operation S404, a third data sequence of a kth period is acquired, K being a positive number greater than zero, the mth period being later than the kth period.
In operation S405, the third data sequence is input into the regression model to obtain a fit slope.
In operation S406, a third data sequence trend result is obtained based on the fitted slope.
In operation S407, a detection result of the first data is obtained according to the waveform similarity result, the second data sequence trend result, and the third data sequence trend result.
Fig. 5 schematically shows a block diagram of a time series data abnormality detection apparatus according to an embodiment of the present disclosure.
As shown in fig. 5, the apparatus 500 for detecting a time series data abnormality of this embodiment includes a first acquisition module 510, a first acquisition module 520, a calculation module 530, a second acquisition module 540, and a presentation module 550.
The first obtaining module 510 is configured to obtain a first data sequence of an nth time period and a second data sequence of an mth time period, N, M being positive numbers greater than zero, where the nth time period is later than the mth time period. In an embodiment, the first obtaining module 510 may be configured to perform the operation S210 described above, which is not described herein.
The first obtaining module 520 is configured to obtain a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence. In an embodiment, the first obtaining module 520 may be used to perform the operation S220 described above, which is not described herein.
The calculating module 530 is configured to calculate a second data sequence trend result according to the time information of the nth time period and the time information of the mth time period, where the second data sequence trend result represents a data change trend of the second data sequence. In an embodiment, the calculating module 530 may be configured to perform the operation S230 described above, which is not described herein.
The second obtaining module 540 is configured to obtain a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result. In an embodiment, the second obtaining module 540 may be used to perform the operation S240 described above, which is not described herein.
And the display module 550 is configured to display the detection result on the interactive interface when the detection result of the first data sequence is abnormal. In an embodiment, the presentation module 550 may be configured to perform the operation S250 described above, which is not described herein.
According to an embodiment of the present disclosure, the first obtaining module 520 includes a first calculating sub-module and a first obtaining sub-module. The first calculation submodule is used for calculating the average sequence similarity according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence. The first obtaining submodule is used for obtaining a waveform similarity result between the first data sequence and the second data sequence according to the sequence average similarity.
According to an embodiment of the present disclosure, the calculation module 530 includes a second calculation sub-module and a detection sub-module. The second calculation submodule is used for calculating a difference sequence between the first data sequence and the second data sequence according to the time information of the N-th time period and the time information of the M-th time period. The detection sub-module is used for detecting the data change trend of the difference sequence based on a trend detection algorithm to obtain a trend result of the second data sequence.
According to an embodiment of the present disclosure, a detection sub-module includes a detection unit and a first obtaining unit. The detection unit is used for detecting the data change trend of the difference sequence based on a detection algorithm to obtain a difference trend result. The first obtaining unit is used for obtaining a trend result of the second data sequence under the condition that the trend result of the difference represents that the difference sequence has an upward trend.
According to an embodiment of the present disclosure, the first obtaining unit includes a first determining subunit, a fitting subunit, and a first obtaining subunit. The first determining subunit is configured to determine, from the difference sequence, a difference subsequence with an increasing trend if the difference trend result indicates that the difference sequence has the increasing trend. The fitting sub-unit is used for fitting the difference sub-sequence based on a least square method algorithm to obtain a fitting difference sub-sequence. The first obtaining subunit is used for obtaining a trend result of the second data sequence according to the rising basic point and the rising absolute value of the fitting difference value subsequence.
According to an embodiment of the present disclosure, obtaining a second data sequence trend result from fitting the rising base point and the rising absolute value of the difference subsequence includes: and calculating to obtain the rising ratio according to the rising basic point and the rising absolute value of the fitting difference subsequence. And under the condition that the rising ratio meets the rising ratio threshold, obtaining a trend result of the second data sequence based on the difference data in the difference sequence and the difference sequence mean value, wherein the difference sequence mean value represents the mean value of the difference data in the difference sequence.
According to an embodiment of the disclosure, the apparatus further includes a second acquisition module and an input module. The second acquisition module is used for acquiring a third data sequence in a kth time period, K is a positive number greater than zero, and the mth time period is later than the kth time period. The input module is used for inputting the third data sequence into the regression model to obtain a third data sequence trend result.
According to an embodiment of the present disclosure, the input module includes an input sub-module and a second obtaining sub-module. The input submodule is used for inputting the third data sequence into the regression model to obtain a fitting slope. The second obtaining sub-module is used for obtaining a third data sequence trend result based on the fitting slope.
According to an embodiment of the present disclosure, the second obtaining module 540 further comprises a third obtaining sub-module. The third obtaining submodule is used for obtaining a detection result of the first data according to the waveform similarity result, the second data sequence trend result and the third data sequence trend result.
It should be noted that, in the embodiment of the present disclosure, the time series data abnormality detection device portion corresponds to the time series data abnormality detection method portion in the embodiment of the present disclosure, and the description of the time series data abnormality detection device portion specifically refers to the time series data abnormality detection method portion, which is not described herein.
According to embodiments of the present disclosure, any of the first obtaining module 510, the first obtaining module 520, the calculating module 530, the second obtaining module 540, and the presentation module 550 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the first acquisition module 510, the first acquisition module 520, the calculation module 530, the second acquisition module 540, and the presentation module 550 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of any of the three implementations of software, hardware, and firmware. Or at least one of the first obtaining module 510, the first obtaining module 520, the calculating module 530, the second obtaining module 540, and the presentation module 550 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
Fig. 6 schematically illustrates a block diagram of an electronic device adapted to implement a method of time-series data anomaly detection, according to an embodiment of the present disclosure.
As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. The processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 601 may also include on-board memory for caching purposes. The processor 601 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. The processor 601 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 602 and/or the RAM 603. Note that the program may be stored in one or more memories other than the ROM 602 and the RAM 603. The processor 601 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 600 may also include an input/output (I/O) interface 605, the input/output (I/O) interface 605 also being connected to the bus 604. The electronic device 600 may also include one or more of the following components connected to an input/output (I/O) interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to an input/output (I/O) interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 602 and/or RAM 603 and/or one or more memories other than ROM 602 and RAM 603 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, is configured to cause the computer system to implement the method for detecting a time series data anomaly provided by an embodiment of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of signals over a network medium, and downloaded and installed via the communication section 609, and/or installed from the removable medium 611. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 601. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (14)

1. A method for detecting anomalies in time series data, the method comprising:
Acquiring a first data sequence of an N-th time period and a second data sequence of an M-th time period, wherein N, M are positive numbers larger than zero, and the N-th time period is later than the M-th time period, and the first data sequence and the second data sequence are formed by a plurality of service data arranged based on a time sequence relation;
Obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence;
According to the time information of the N-th time period and the time information of the M-th time period, calculating to obtain a second data sequence trend result, wherein the second data sequence trend result represents the data change trend of the second data sequence;
Obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result;
And under the condition that the detection result of the first data sequence represents abnormality, displaying the detection result on an interactive interface.
2. The method of claim 1, wherein the obtaining a waveform similarity result between the first data sequence and the second data sequence based on the similarity between the first data sequence and the second data sequence and the data amount of the first data sequence comprises:
Calculating to obtain a sequence average similarity according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence;
And obtaining the waveform similarity result between the first data sequence and the second data sequence according to the sequence average similarity.
3. The method of claim 1, wherein calculating a second data sequence trend result from the time information of the nth time period and the time information of the mth time period comprises:
calculating a difference sequence between the first data sequence and the second data sequence according to the time information of the N-th time period and the time information of the M-th time period;
And detecting the data change trend of the difference sequence based on a trend detection algorithm to obtain a trend result of the second data sequence.
4. The method of claim 3, wherein the trend detection algorithm detects a trend of the data of the difference sequence, and obtaining the trend result of the second data sequence comprises:
detecting the data change trend of the difference sequence based on the detection algorithm to obtain a difference trend result;
and under the condition that the difference trend result represents that the difference sequence has an upward trend, obtaining the trend result of the second data sequence.
5. The method of claim 4, wherein, in the case where the difference trend result indicates that the difference sequence has an increasing trend, obtaining the second data sequence trend result comprises:
determining a difference subsequence with an upward trend from the difference sequence under the condition that the difference trend result represents the difference sequence with the upward trend;
fitting the difference subsequence based on a least square method algorithm to obtain the fitted difference subsequence;
and obtaining the trend result of the second data sequence according to the rising basic point and the rising absolute value of the fitting difference subsequence.
6. The method of claim 5, wherein obtaining the second data sequence trend result from the fitting base points and absolute values of the rises of the difference subsequences comprises:
Calculating to obtain an ascending ratio according to the ascending basic point and the ascending absolute value of the fitting difference subsequence;
and under the condition that the rising ratio meets a rise ratio threshold, obtaining a second data sequence trend result based on the difference data in the difference sequence and a difference sequence mean value, wherein the difference sequence mean value represents the mean value of the difference data in the difference sequence.
7. The method according to claim 1, wherein the method further comprises:
acquiring a third data sequence of a Kth time period, wherein K is a positive number larger than zero, and the Mth time period is later than the Kth time period;
and inputting the third data sequence into a regression model to obtain a third data sequence trend result.
8. The method of claim 7, wherein inputting the third data sequence into a regression model to obtain a third data sequence trend result comprises:
Inputting the third data sequence into a regression model to obtain a fitting slope;
and obtaining the third data sequence trend result based on the fitting slope.
9. The method of claim 7, wherein obtaining the detection result of the first data sequence based on the waveform similarity result and the second data sequence trend result further comprises:
And obtaining a detection result of the first data according to the waveform similarity result, the second data sequence trend result and the third data sequence trend result.
10. The method of claim 1, wherein the first data in the first data sequence or the second data in the second data sequence comprises at least one of:
Transaction data, device operation data, network traffic data.
11. A time series data abnormality detection apparatus, characterized by comprising:
The first acquisition module is used for acquiring a first data sequence of an nth time period and a second data sequence of an mth time period, N, M are positive numbers larger than zero, the nth time period is later than the mth time period, and the first data sequence and the second data sequence are formed by a plurality of service data arranged based on a time sequence relation;
The first obtaining module is used for obtaining a waveform similarity result between the first data sequence and the second data sequence according to the similarity between the first data sequence and the second data sequence and the data volume of the first data sequence;
the calculation module is used for calculating a second data sequence trend result according to the time information of the N-th time period and the time information of the M-th time period, and the second data sequence trend result represents the data change trend of the second data sequence;
The second obtaining module is used for obtaining a detection result of the first data sequence according to the waveform similarity result and the second data sequence trend result;
and the display module is used for displaying the detection result on an interactive interface under the condition that the detection result of the first data sequence represents abnormality.
12. An electronic device, comprising:
One or more processors;
a memory for storing one or more computer programs,
Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1 to 10.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1-10.
14. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1-10.
CN202410329000.XA 2024-03-21 2024-03-21 Time series data abnormity detection method, device, equipment, storage medium and program product Pending CN118114180A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410329000.XA CN118114180A (en) 2024-03-21 2024-03-21 Time series data abnormity detection method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410329000.XA CN118114180A (en) 2024-03-21 2024-03-21 Time series data abnormity detection method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN118114180A true CN118114180A (en) 2024-05-31

Family

ID=91212138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410329000.XA Pending CN118114180A (en) 2024-03-21 2024-03-21 Time series data abnormity detection method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN118114180A (en)

Similar Documents

Publication Publication Date Title
US10171335B2 (en) Analysis of site speed performance anomalies caused by server-side issues
US20190370163A1 (en) Method and apparatus for outputting information
US11900163B2 (en) Autonomous management of computing systems
CN113515399A (en) Data anomaly detection method and device
US20140366140A1 (en) Estimating a quantity of exploitable security vulnerabilities in a release of an application
CN115439160A (en) Anomaly monitoring method, apparatus, device, medium, and program product
CN109885564B (en) Method and apparatus for transmitting information
CN114218283A (en) Abnormality detection method, apparatus, device, and medium
CN117934154A (en) Transaction risk prediction method, model training method, device, equipment, medium and program product
CN118114180A (en) Time series data abnormity detection method, device, equipment, storage medium and program product
CN115795345A (en) Information processing method, device, equipment and storage medium
US11436608B1 (en) Commercial credit card system
CN113129127A (en) Early warning method and device
CN115312208B (en) Method, device, equipment and medium for displaying treatment data
CN116974871A (en) System performance evaluation method, device, equipment and storage medium
CN116483716A (en) Test information generation method, device, equipment and storage medium
CN116434365A (en) Request response method, device, equipment and storage medium
CN114328151A (en) Operation and maintenance event relation mining method, device, equipment and medium
CN116932326A (en) Server fault monitoring method, device, equipment, medium and program product
CN118521398A (en) Risk prediction method and apparatus, device, storage medium, and program product
CN114611915A (en) Team maturity assessment method, apparatus, device, medium and program product
US20140214498A1 (en) System and method for ensuring timing study quality in a service delivery environment
CN114254054A (en) Abnormality detection method, apparatus, device, and medium
CN118297709A (en) Risk user assessment method, apparatus, device, medium and program product
CN118469708A (en) Product recommendation method, device, apparatus, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination