CN111612082A - Method and device for detecting abnormal subsequence in time sequence - Google Patents

Method and device for detecting abnormal subsequence in time sequence Download PDF

Info

Publication number
CN111612082A
CN111612082A CN202010456099.1A CN202010456099A CN111612082A CN 111612082 A CN111612082 A CN 111612082A CN 202010456099 A CN202010456099 A CN 202010456099A CN 111612082 A CN111612082 A CN 111612082A
Authority
CN
China
Prior art keywords
time
point
probability
subsequence
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010456099.1A
Other languages
Chinese (zh)
Other versions
CN111612082B (en
Inventor
翟波
张亚
曾海芳
覃桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Xiaopenguin Medical Technology Co ltd
Original Assignee
Hebei Xiaopenguin Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Xiaopenguin Medical Technology Co ltd filed Critical Hebei Xiaopenguin Medical Technology Co ltd
Priority to CN202010456099.1A priority Critical patent/CN111612082B/en
Publication of CN111612082A publication Critical patent/CN111612082A/en
Application granted granted Critical
Publication of CN111612082B publication Critical patent/CN111612082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Abstract

The embodiment of the invention provides a method and equipment for detecting an abnormal subsequence in a time sequence. The method comprises the following steps: adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point; constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table; and acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of the ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower. The invention ensures the detection precision and reliability of the abnormal subsequence.

Description

Method and device for detecting abnormal subsequence in time sequence
Technical Field
The embodiment of the invention relates to the technical field of data mining, in particular to a method and equipment for detecting an abnormal subsequence in a time sequence.
Background
In real life, a large amount of time-series data such as electrocardiogram data of a patient, electroencephalogram data, sensor data of an industrial field, network flow data, and the like are contained in each field. The time sequence data is data formed according to the data generation precedence relationship. Thus, time series data records fluctuating information for an action in the time dimension, while abnormal subsequences that may be contained in the time series data may contain more important information than most normal subsequences. For example, abnormal electrocardiographic data means that a patient may suffer from a certain type of heart disease, and abnormal electroencephalographic data may be caused by brain diseases such as epilepsy. The detection of abnormal subsequences (patterns) in time series is a very important field, most data of the time series containing abnormal patterns are represented as normal forms, the frequency of the abnormal patterns is very low, but the rarely occurring abnormal patterns contain very important information. The anomaly detection algorithm does not need known data in an unsupervised time series, and belongs to a machine learning algorithm of inert learning. In an unsupervised abnormal subsequence detection algorithm, comparing any two subsequences in any time sequence to judge the abnormal condition; but the time series data has the characteristics of dynamics and the like and is often high-dimensional data; therefore, the methods for comparing two subsequences tend to require a large time overhead, and in the process of transforming the time series representation, the information of the time series data in the time dimension is often lost, so that the detection accuracy of the algorithm is affected. Therefore, the detection research of abnormal subsequences of time sequence data is of great practical significance. Therefore, it is an urgent technical problem in the art to develop a method for detecting an abnormal subsequence in a time sequence, which can effectively overcome the above-mentioned drawbacks in the related art.
Disclosure of Invention
In view of the above problems in the prior art, embodiments of the present invention provide a method and an apparatus for detecting an abnormal subsequence in a time sequence.
In a first aspect, an embodiment of the present invention provides a method for detecting an abnormal subsequence in a time series, including: adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point; constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower; wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
On the basis of the content of the above method embodiment, the method for detecting an abnormal subsequence in a time series provided in the embodiment of the present invention, which uses a single numerical value and a single time point to form a tuple, and forms a plurality of tuples into a time series, includes:
P={(t1,p1),(t2,p2),(t3,p3),...,(tn,pn)}
wherein n is the length of the time sequence and is an arbitrary integer; (t)n,pn) Is the one-tuple; p is the time series; t is tnIs the single time point; p is a radical ofnAre the individual values.
On the basis of the content of the above method embodiment, the method for detecting an abnormal subsequence in a time series provided in the embodiment of the present invention, where the defining of the similarity of different time series at any time point, includes: if the first time sequence and the second time sequence are at the t1To tnAnd if the numerical value of any time point in the time is in the same numerical value interval, judging that the first time sequence and the second time sequence are similar at any time point.
On the basis of the content of the above method embodiment, in the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, the probability density is:
Figure BDA0002509298290000021
the probability is:
Figure BDA0002509298290000022
wherein x is any time point, S is the number of value intervals, βiIs the ith split point; i-0, …, S-1.
On the basis of the content of the above method embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention includes, as follows:
Figure BDA0002509298290000031
Figure BDA0002509298290000032
where p' is the derivative of p with respect to time, G is a constructor, and a decision β is made if G is zeroiIs a break point, if G is not zero, β is determinediNot the split point.
On the basis of the content of the above method embodiment, in the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, an interval table is constructed according to the probability and a plurality of numerical intervals, and accordingly, elements of the interval table include:
Figure BDA0002509298290000033
wherein j is the jth numerical interval; ITable is an element of the interval table.
On the basis of the above-mentioned contents of the method embodiments, the method for detecting an abnormal subsequence in a time series provided in the embodiments of the present invention, where the re-averaging of ownership is used as a score of each subsequence, includes:
Figure BDA0002509298290000034
Figure BDA0002509298290000035
wherein, tiIs the ith time point; score (t)i) Is a point of time tiThe weight in the extended interval table of (a); score (p) is the fraction of subsequences; w is a weight; r isj+1,iIs a point of time tiCompact coefficients in the interval between the position of the numerical space and the adjacent upper region; r isj-1,iIs a point of time tiCompact coefficients at the position of the numerical space and the adjacent lower interval.
In a second aspect, an embodiment of the present invention provides an apparatus for detecting an abnormal subsequence in a time series, including:
the sequence construction module is used for forming a tuple by adopting a single numerical value and a single time point, forming a plurality of tuples into a time sequence and defining the similarity of different time sequences at any time point;
the extended interval table building module is used for building a plurality of split points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, obtaining the probability density of the time sequence, obtaining the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, building an interval table according to the probability and the plurality of numerical value intervals, and building an extended interval table according to the interval table;
an anomaly determination module, configured to obtain a weight of each time point of each subsequence of the time sequence in the extended interval table, and take an average value of ownership as a score of each subsequence, where if the score is smaller, the probability that the subsequence is determined to be anomalous is lower;
wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of detecting an abnormal subsequence in a time series as provided by any of the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method for detecting an abnormal subsequence in a time series, as provided in any of the various possible implementations of the first aspect.
According to the method and the device for detecting the abnormal subsequence in the time sequence, provided by the embodiment of the invention, the time sequence and the similarity of the time sequence are redefined, the numerical space is divided into a plurality of numerical intervals, the probability density and the corresponding falling probability of the time sequence are further obtained, an extended interval table is constructed on the basis, and the subsequence of the time sequence is scored according to the weight in the extended interval table, so that the algorithm detection efficiency can be improved on the premise of ensuring the complete information of the time sequence in the time dimension, and the detection precision and the reliability of the abnormal subsequence are ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a method for detecting an abnormal subsequence in a time series according to an embodiment of the present invention;
fig. 2 is a schematic diagram of positions of time points of electrocardiographic data in a numerical space according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating similarity of numerical points at the same time point in different time sequences according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for detecting an abnormal subsequence in a time series according to an embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, technical features of various embodiments or individual embodiments provided by the invention can be arbitrarily combined with each other to form a feasible technical solution, but must be realized by a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination is not considered to exist and is not within the protection scope of the present invention.
The embodiment of the invention provides a method for detecting an abnormal subsequence in a time sequence, and the method comprises the following steps of:
101. adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point;
102. constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table;
103. acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower;
wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
Based on the content of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time series provided in the embodiment of the present invention, where a single numerical value and a single time point are used to form a tuple, and a plurality of tuples are formed into the time series, includes:
P={(t1,p1),(t2,p2),(t3,p3),...,(tn,pn)} (1)
wherein n is the length of the time sequence and is an arbitrary integer; (t)n,pn) Is the one-tuple; p is the time series; t is tnIs the single time point; p is a radical ofnAre the individual values.
Specifically, it is assumed that there is a time series such as equation (1). If each tuple (t)i,pi) Considered as coordinates in two-dimensional space, it can locate a point in space. Thus, the tuple (t)i,pi) Can be understood as being defined by piTo represent a time point tiAt a location in numerical space. Thus, the representation of the time series converted in this manner of understanding is illustrated by taking an example of an electrocardiogram data in an ECG200, as shown in FIG. 2. In FIG. 2, p is usediThe subscript in the time series P denotes ti(from t)0To t95) Then, the position of each time point of the electrocardiographic data in the numerical space is shown in fig. 2.
Based on the contents of the above-described method embodiments,as an alternative embodiment, the method for detecting an abnormal subsequence in a time series provided in the embodiment of the present invention, where defining the similarity of different time series at any time point includes: if the first time sequence and the second time sequence are at the t1To tnAnd if the numerical value of any time point in the time is in the same numerical value interval, judging that the first time sequence and the second time sequence are similar at any time point.
In particular, for a time series of equal arbitrary length, the point of time tiAre identical, the difference between the time series can only be represented by the value p at each point in timeiIs different from piRepresents the corresponding tiAt a location in numerical space. Therefore, the similarity calculation of the time series can be completed by measuring the adjacent relation of the corresponding time points in the time series in the numerical value space. If the positions of the time points in the numerical value space are adjacent, the time points are similar; if the positions of the time points in the value space are far apart, it is indicated that the time series are dissimilar at the time points. That is, the time point t if the time series P and QiIn the same value interval in the value space, the time sequences P and Q are then at the point in time tiAbove are adjacent, also referred to as similar. For example, in FIG. 3, the entire numerical space is divided into five sections by straight lines, and the time points t of the time series P and Q are12Are adjacent, at a point of time t0Are not adjacent (total t)0To t18One time point).
Based on the content of the foregoing method embodiment, as an optional embodiment, in the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, the probability density is:
Figure BDA0002509298290000071
the probability is:
Figure BDA0002509298290000072
wherein x is any time point, S is the number of value intervals, βiIs the ith split point; i-0, …, S-1.
The probability equality, i.e., the likelihood that any data point falls within any numerical interval is the same, the numerical space is divided into S numerical intervals, requiring determination of S-1 split points β123<...<βS-1Then S intervals are [ β ]01],[β12],....,[βS-1S]Wherein β0=-∞,βSInfinity. Assuming that the time sequence conforms to X-N (0,1) normal distribution, the probability density function of the obtained time sequence is shown in formula (2); then the probability that any time point of the time series falls in any value interval is calculated, as shown in formula (3).
Based on the content of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, where the constructing the plurality of split points, includes:
Figure BDA0002509298290000073
Figure BDA0002509298290000074
where p' is the derivative of p with respect to time, G is a constructor, and a decision β is made if G is zeroiIs a break point, if G is not zero, β is determinediNot the split point.
To verify the effectiveness of the algorithm on more intervals, Newton's method is used to calculate the split points for more intervals, first, the constructor G (x) is shown in equation (4), where βiIs the known previous split point (β)0=-∞,βSThen an iterative solution β may be constructedi+1After each iteration, the formula (4) is used to judge whether the stopping condition is met (if G is equal to 0, the point is judged to be a split point, otherwise, the point is not the split point) so as to obtain each split point.
Based on the content of the foregoing method embodiment, as an optional embodiment, in the method for detecting an abnormal subsequence in a time sequence provided in the embodiment of the present invention, the interval table is constructed according to the probability and a plurality of numerical intervals, and accordingly, the elements of the interval table include:
Figure BDA0002509298290000081
wherein j is the jth numerical interval; ITable is an element of the interval table.
Specifically, each Interval of the Interval Table (ITable) counts a set of time points corresponding to data points located in the Interval. Because the time of the time sequence is consistent, the time point set of each interval can be converted into binary representation; therefore, each interval table is a two-dimensional matrix of S × n, S represents the number of value intervals, n represents the length of the time series, and each element in the interval table can only take a value of 0 or 1. If the element is 1, the position of the time point in the numerical space is in the corresponding section, otherwise, the position of the time point in the numerical space does not fall in the corresponding section. The form of each element in Itable is as shown in (6). For subsequences with equal time series length, the converted interval tables are not only similar in structure, but also the number of binary 1 appearing in each interval table is the same. The position of binary 1 appearing in the interval table represents the difference of the interval table. Then, combining the "few and different" features of abnormal data, it can be found that if the data point of the subsequence at some time point is abnormal, the position of 1 in the interval table corresponding to the subsequence will be different from the position of 1 in most other interval tables.
Based on the content of the foregoing method embodiment, as an optional embodiment, the method for detecting an abnormal subsequence in a time series provided in the embodiment of the present invention, where the re-averaging ownership value is used as a score of each subsequence, includes:
Figure BDA0002509298290000082
Figure BDA0002509298290000083
wherein, tiIs the ith time point; score (t)i) Is a point of time tiThe weight in the extended interval table of (a); score (p) is the fraction of subsequences; w is a weight; r isj+1,iIs a point of time tiCompact coefficients in the interval between the position of the numerical space and the adjacent upper region; r isj-1,iIs a point of time tiCompact coefficients at the position of the numerical space and the adjacent lower interval.
Specifically, each extended Interval Table (EITable) is composed of a matrix with the size of S × n, wherein n represents the length of the subsequence composing the EITable, and S represents the number of the value Intervalj,iIs an integer of 0 or more and represents a time point t calculated in the data setiInterval between valuesjThe weight in (1). The structure of the EITable is shown in Table 1.
TABLE 1
Figure BDA0002509298290000091
After the EITable is constructed, the weight distribution of each time point in the time sequence data set in S intervals can be obtained. The larger the weight is, the more time points of the data set with more subsequences are distributed in the interval in the position of the numerical space; the smaller the weight, the less the positions of the time points with fewer sub-sequences in the value space fit the distribution of the interval. From the "few and different" features of the anomaly data, it can be inferred that if a subsequence is anomalous, then some or even all of the time points in the subsequence have a distribution in the value space that is significantly different from the distribution of time points corresponding to the majority of other subsequences. The difference of the time points in the numerical space is reflected in that the weights of the time points of the abnormal subsequence in the EITable are different, and the weights of the time points are necessarily small. Therefore, the abnormality of the subsequence is judged by calculating the weight fraction of the time series in the EITable. The weight fraction is calculated as follows: based on the constructed extended interval table (EITable), the weight of each time point of each subsequence in the extended interval table is queried, and then the average value of the weights of all time points of the subsequence is calculated as the score of the subsequence, as shown in formula (7). The larger the score of the subsequence P calculated using equation (7), the greater the probability that P will fit the distribution of most subsequences; the smaller the score, the less likely the subsequence P has a distribution that matches most non-self-matching subsequences, and the more likely the subsequence P is abnormal.
score(ti) Is to calculate the point of time tiThe weight in the EITable is divided into three parts: t is tiThe weight of the section to which the user belongs and the weights of two sections adjacent up and down; but if tiBelongs to the first and last value intervals, then the time point t obtained by querying EITable is inquirediThe weight of (2) is composed of two parts: t is tiThe weight of the section to which the user belongs and the weight of the adjacent previous or next section. When calculating the scores of adjacent intervals, firstly, the time point t needs to be calculatediDegree of compactness between the location of the numerical space and the adjacent zones; if the time point tiIs very compact with the neighboring region, the time point t can be approximatediClassifying into adjacent intervals; if there is a point difference with the adjacent region, only the time point t can be describediThere is a neighborhood relationship with a few data in neighboring cells. Thus, score (t) is calculatedi) The formula (2) is shown in the formula (8).
According to the method for detecting the abnormal subsequence in the time sequence, provided by the embodiment of the invention, the time sequence and the similarity of the time sequence are redefined, the numerical space is divided into a plurality of numerical intervals, the probability density and the corresponding falling probability of the time sequence are further obtained, an extended interval table is constructed on the basis, and the subsequence of the time sequence is scored according to the weight in the extended interval table, so that the algorithm detection efficiency can be improved on the premise of ensuring the completeness of the information of the time sequence in the time dimension, and the detection precision and the reliability of the abnormal subsequence are ensured.
In order to more clearly illustrate the essence of the technical solution of the present invention, on the basis of the above-mentioned embodiments, an overall embodiment is proposed, which shows the overall view of the technical solution of the present invention. It should be noted that the whole embodiment is only for further embodying the technical essence of the present invention, and is not intended to limit the scope of the present invention, and those skilled in the art can obtain any combination type technical solution meeting the essence of the technical solution of the present invention by combining technical features based on the various embodiments of the present invention, and as long as the combined technical solution can be practically implemented, the combined technical solution is within the scope of the present patent.
First, the time series data set selected in this experiment is shown in table 2(UCR experiment data set) below:
TABLE 2
Figure BDA0002509298290000101
Figure BDA0002509298290000111
Table 2 contains 4 different types of time-series data sets, and the time-series length of these data sets is from 65 to 2709, and the abnormal time-series contained in these data occupy different proportions, respectively. The diverse data in the table can verify the validity analysis of the proposed algorithm for time series anomaly detection from different aspects. In order that the quantitative analysis algorithm can accurately detect the abnormal time series, the AUC index is adopted to evaluate the proposed algorithm. And the AUC represents the area of a graph formed by the ROC curve and two coordinate axes, the ROC curve can be used for evaluating the indexes of the effect of the two classifiers, the data samples are sorted according to the prediction result of the classifiers, different threshold values are sequentially selected according to the sequence, the sample with the prediction effect larger than the threshold value is taken as a positive example, and the sample with the prediction result smaller than the threshold value is taken as a negative example. Each time, an element (FPR, TPR) is obtained by dividing according to a different threshold, wherein FPR represents a false positive rate and TPR represents a true positive rate. And then, calculating two important quantity values each time, and respectively drawing by taking the FPR as an abscissa and the TPR as an ordinate to obtain the ROC curve. The true positive rate is also referred to as sensitivity in machine learning, and the false positive rate is also referred to as the probability of false positives.
The selected comparison algorithms are respectively as follows: an angle-based anomaly detection algorithm (FastVOA) proposed in 2012; an anomaly detection algorithm (PAPR-RW) combining segment aggregation approximation and a random walk model is proposed in 2017; anomaly detection algorithm (RDOS) based on kernel density proposed in 2017; a time series anomaly detection algorithm (Internal) based on interval set proposed in 2018. The parameter setting of the comparison algorithm is that according to the parameter setting suggested in the reference, the neighbor number in the RDOS algorithm is set to be 10; the number of hash functions of the FastVOA algorithm is set to 100; the parameters proposed in the parametric references in the Internal algorithm, such as the coefficient of the boundary width is 0.2; the number of subspaces in the PAPR-RW algorithm is set to be a value in a range of 6-9 according to the suggestion, and other three parameters are respectively set to be 0.3, 0.4 and 0.3. Experimental results as shown in table 3, the best first two experimental results on each data set are shown in bold, NA indicates that in the current experimental environment, the algorithm cannot be calculated on this data set. The AUC scores of the respective algorithms on the data set are shown in table 3.
TABLE 3
Figure BDA0002509298290000121
In the experimental results in table 3, the results in column EITable are the experimental results of the algorithm proposed in the present subject, and the other columns are the experimental results of the selected comparison algorithm. From the experimental results in the table, it can be found that in most of the time data sets, the EITable has better detection results, and compared with other algorithms, the EITable has different degrees of improvement in AUC score. For example: on a MoTeStrain data set, the two proposed algorithms are improved by more than ten percent compared with other algorithms; on Lighting2 dataset, the improvement is ten percent over RDOS algorithm and is nearly twenty percent over RDOS algorithm; there is also a ten percent improvement over other algorithms on the ECG200 data set; there were relatively good results on the three datasets of the Diatomsize reduction.
Besides verifying the effectiveness of the experiment under the AUC index, the difference of the proposed algorithm and the comparison algorithm in CPU time is counted. The CPU runtime pairs for the respective methods on the data set of Table 2 are shown in Table 4. From the experimental results of table 4 (run-time comparisons across different time series datasets), it can be seen that the EITable requires less run-time on most datasets, which requires only linear time complexity; the Internal algorithm divides a time sequence and works out a similar matrix, and for a small data set, less time is needed, so that the best running time can be achieved in a part of data sets; the algorithm RDOS needs to calculate the Euclidean distance between time sequences and solve k neighbor, and all the time is long; PAPR-RW requires the most running time because it needs to convert the time series expression and calculate the similar matrix first and input the similar matrix into RW model for many times of iterative optimization.
TABLE 4
Figure BDA0002509298290000131
The implementation basis of the various embodiments of the present invention is realized by programmed processing performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present invention can be packaged into various modules. Based on this reality, on the basis of the above embodiments, the embodiments of the present invention provide an apparatus for detecting an abnormal subsequence in time series, which is used to execute the method for detecting an abnormal subsequence in time series in the above method embodiments. Referring to fig. 4, the apparatus includes:
the sequence construction module 401 is configured to adopt a single numerical value and a single time point to form a tuple, form a time sequence with a plurality of tuples, and define the similarity of different time sequences at any time point;
an extended interval table constructing module 402, configured to construct a plurality of split points to divide a numerical space in a time sequence into a plurality of numerical intervals, obtain a probability density of the time sequence, obtain a probability that any time point in the time sequence falls into any numerical interval according to the probability density, construct an interval table according to the probability and the plurality of numerical intervals, and construct an extended interval table according to the interval table;
an anomaly determination module 403, configured to obtain a weight of each time point of each subsequence of the time sequence in the extended interval table, and take an average value of ownership as a score of each subsequence, where if the score is smaller, the probability that the subsequence is determined to be anomalous is lower;
wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
The device for detecting the abnormal subsequence in the time sequence provided by the embodiment of the invention adopts the sequence construction module, the extended interval table construction module and the abnormal judgment module, redefines the time sequence and the similarity thereof, divides the numerical space into a plurality of numerical intervals, further obtains the probability density and the corresponding falling probability of the time sequence, constructs the extended interval table on the basis, scores the subsequence of the time sequence according to the weight in the extended interval table, can improve the algorithm detection efficiency on the premise of ensuring the complete information of the time sequence in the time dimension, and ensures the detection precision and reliability of the abnormal subsequence.
The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. To this end, an embodiment of the present invention provides an electronic apparatus, as shown in fig. 5, including: at least one processor (processor)501, a communication Interface (Communications Interface)504, at least one memory (memory)502 and a communication bus 503, wherein the at least one processor 501, the communication Interface 504 and the at least one memory 502 are in communication with each other through the communication bus 503. The at least one processor 501 may call logic instructions in the at least one memory 502 to perform the following method: adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point; constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower; wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
Furthermore, the logic instructions in the at least one memory 502 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. Examples include: adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point; constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table; acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower; wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this recognition, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for detecting an abnormal subsequence in a time series, comprising:
adopting a single numerical value and a single time point to form a tuple, forming a plurality of tuples into a time sequence, and defining the similarity of different time sequences at any time point;
constructing a plurality of splitting points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, acquiring the probability density of the time sequence, acquiring the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, constructing an interval table according to the probability and the plurality of numerical value intervals, and constructing an extended interval table according to the interval table;
acquiring the weight of each time point of each subsequence of the time sequence in the extended interval table, taking the average value of ownership as the score of each subsequence, and if the score is smaller, the probability that the subsequence is determined to be abnormal is lower;
wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
2. The method for detecting abnormal subsequences in time series according to claim 1, wherein said forming a tuple with a single value and a single time point, and forming several tuples into time series comprises:
P={(t1,p1),(t2,p2),(t3,p3),...,(tn,pn)}
wherein n is the length of the time sequence and is an arbitrary integer; (t)n,pn) Is the one-tuple; p is the time series; t is tnIs the single time point; p is a radical ofnAre the individual values.
3. The method for detecting abnormal subsequences in time series as claimed in claim 2, wherein said defining similarity of different time series at any time point comprises: if the first time sequence and the second time sequence are at the t1To tnIf the value of any time point in the time is in the same value interval, judging the first timeThe sequence and the second time sequence are similar at any one point in time.
4. The method of claim 2, wherein the probability density is:
Figure FDA0002509298280000011
the probability is:
Figure FDA0002509298280000021
wherein x is any time point, S is the number of value intervals, βiIs the ith split point; i-0, …, S-1.
5. The method for detecting abnormal subsequences in time series according to claim 4, wherein said constructing several split points comprises:
Figure FDA0002509298280000022
Figure FDA0002509298280000023
where p' is the derivative of p with respect to time, G is a constructor, and a decision β is made if G is zeroiIs a break point, if G is not zero, β is determinediNot the split point.
6. The method for detecting abnormal subsequences in time sequence according to claim 4, wherein said constructing an interval table according to said probability and several value intervals, and accordingly, the elements of said interval table comprise:
Figure FDA0002509298280000024
wherein j is the jth numerical interval; ITable is an element of the interval table.
7. The method of claim 6, wherein said re-averaging ownership as a score of each of said subsequences comprises:
Figure FDA0002509298280000025
Figure FDA0002509298280000026
wherein, tiIs the ith time point; score (t)i) Is a point of time tiThe weight in the extended interval table of (a); score (p) is the fraction of subsequences; w is a weight; r isj+1,iIs a point of time tiCompact coefficients in the interval between the position of the numerical space and the adjacent upper region; r isj-1,iIs a point of time tiCompact coefficients at the position of the numerical space and the adjacent lower interval.
8. An apparatus for detecting an abnormal subsequence in a time series, comprising:
the sequence construction module is used for forming a tuple by adopting a single numerical value and a single time point, forming a plurality of tuples into a time sequence and defining the similarity of different time sequences at any time point;
the extended interval table building module is used for building a plurality of split points to divide a numerical value space in a time sequence into a plurality of numerical value intervals, obtaining the probability density of the time sequence, obtaining the probability that any time point in the time sequence falls into any numerical value interval according to the probability density, building an interval table according to the probability and the plurality of numerical value intervals, and building an extended interval table according to the interval table;
an anomaly determination module, configured to obtain a weight of each time point of each subsequence of the time sequence in the extended interval table, and take an average value of ownership as a score of each subsequence, where if the score is smaller, the probability that the subsequence is determined to be anomalous is lower;
wherein the value space is composed of all values in the tuples; the probability that any numerical point falls into any numerical interval is the same.
9. An electronic device, comprising:
at least one processor, at least one memory, and a communication interface; wherein the content of the first and second substances,
the processor, the memory and the communication interface are communicated with each other;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.
10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202010456099.1A 2020-05-26 2020-05-26 Method and device for detecting abnormal subsequence in time sequence Active CN111612082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010456099.1A CN111612082B (en) 2020-05-26 2020-05-26 Method and device for detecting abnormal subsequence in time sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010456099.1A CN111612082B (en) 2020-05-26 2020-05-26 Method and device for detecting abnormal subsequence in time sequence

Publications (2)

Publication Number Publication Date
CN111612082A true CN111612082A (en) 2020-09-01
CN111612082B CN111612082B (en) 2023-06-23

Family

ID=72196337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010456099.1A Active CN111612082B (en) 2020-05-26 2020-05-26 Method and device for detecting abnormal subsequence in time sequence

Country Status (1)

Country Link
CN (1) CN111612082B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130107889A (en) * 2012-03-23 2013-10-02 삼성전자주식회사 Aparatus and method for detecting anomalous subsequence
CN104156473A (en) * 2014-08-25 2014-11-19 哈尔滨工业大学 LS-SVM-based method for detecting anomaly slot of sensor detection data
CN105574669A (en) * 2015-12-16 2016-05-11 国网山东省电力公司电力科学研究院 Space-time union data clustering analysis based abnormal status detection method for power transmission and transformation device
CN105678409A (en) * 2015-12-31 2016-06-15 哈尔滨工业大学 Adaptive and distribution-free time series abnormal point detection method
US20160299938A1 (en) * 2015-04-10 2016-10-13 Tata Consultancy Services Limited Anomaly detection system and method
CN106127249A (en) * 2016-06-24 2016-11-16 深圳市颐通科技有限公司 A kind of single time series exception subsequence detection method
CN106228002A (en) * 2016-07-19 2016-12-14 北京工业大学 A kind of high efficiency exception time series data extracting method based on postsearch screening
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
US20180110471A1 (en) * 2016-10-21 2018-04-26 Tata Consultancy Services Limited Anomaly detection by self-learning of sensor signals
CN108647737A (en) * 2018-05-17 2018-10-12 哈尔滨工业大学 A kind of auto-adaptive time sequence variation detection method and device based on cluster
CN109542952A (en) * 2018-11-23 2019-03-29 中国民用航空上海航空器适航审定中心 A kind of detection method of time series abnormal point
CN109784042A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 The detection method of abnormal point, device, electronic equipment and storage medium in time series
CN109858522A (en) * 2018-12-29 2019-06-07 国网天津市电力公司电力科学研究院 A kind of management line loss abnormality recognition method based on data mining
CN109871401A (en) * 2018-12-26 2019-06-11 北京奇安信科技有限公司 A kind of time series method for detecting abnormality and device
CN109902703A (en) * 2018-09-03 2019-06-18 华为技术有限公司 A kind of time series method for detecting abnormality and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130107889A (en) * 2012-03-23 2013-10-02 삼성전자주식회사 Aparatus and method for detecting anomalous subsequence
CN104156473A (en) * 2014-08-25 2014-11-19 哈尔滨工业大学 LS-SVM-based method for detecting anomaly slot of sensor detection data
US20160299938A1 (en) * 2015-04-10 2016-10-13 Tata Consultancy Services Limited Anomaly detection system and method
CN105574669A (en) * 2015-12-16 2016-05-11 国网山东省电力公司电力科学研究院 Space-time union data clustering analysis based abnormal status detection method for power transmission and transformation device
CN105678409A (en) * 2015-12-31 2016-06-15 哈尔滨工业大学 Adaptive and distribution-free time series abnormal point detection method
CN106127249A (en) * 2016-06-24 2016-11-16 深圳市颐通科技有限公司 A kind of single time series exception subsequence detection method
CN106228002A (en) * 2016-07-19 2016-12-14 北京工业大学 A kind of high efficiency exception time series data extracting method based on postsearch screening
US20180110471A1 (en) * 2016-10-21 2018-04-26 Tata Consultancy Services Limited Anomaly detection by self-learning of sensor signals
CN107528722A (en) * 2017-07-06 2017-12-29 阿里巴巴集团控股有限公司 Abnormal point detecting method and device in a kind of time series
CN108647737A (en) * 2018-05-17 2018-10-12 哈尔滨工业大学 A kind of auto-adaptive time sequence variation detection method and device based on cluster
CN109902703A (en) * 2018-09-03 2019-06-18 华为技术有限公司 A kind of time series method for detecting abnormality and device
CN109542952A (en) * 2018-11-23 2019-03-29 中国民用航空上海航空器适航审定中心 A kind of detection method of time series abnormal point
CN109871401A (en) * 2018-12-26 2019-06-11 北京奇安信科技有限公司 A kind of time series method for detecting abnormality and device
CN109784042A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 The detection method of abnormal point, device, electronic equipment and storage medium in time series
CN109858522A (en) * 2018-12-29 2019-06-07 国网天津市电力公司电力科学研究院 A kind of management line loss abnormality recognition method based on data mining

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周大镯: "时间序列异常检测" *

Also Published As

Publication number Publication date
CN111612082B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US11449673B2 (en) ESG-based company evaluation device and an operation method thereof
US20180114123A1 (en) Rule generation method and apparatus using deep learning
CN112102899A (en) Construction method of molecular prediction model and computing equipment
Chen et al. Clustering-based feature subset selection with analysis on the redundancy–complementarity dimension
Mandal et al. Unsupervised non-redundant feature selection: a graph-theoretic approach
Zhong et al. Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine
CN113470799A (en) Intelligent editor of hospital comprehensive quality supervision platform
US20230245786A1 (en) Method for the prognosis of a desease following upon a therapeutic treatment, and corresponding system and computer program product
CN112632000A (en) Log file clustering method and device, electronic equipment and readable storage medium
US20200142910A1 (en) Data clustering apparatus and method based on range query using cf tree
CN111612082A (en) Method and device for detecting abnormal subsequence in time sequence
CN115344386A (en) Method, device and equipment for predicting cloud simulation computing resources based on sequencing learning
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
CN114386483A (en) Method, apparatus, device, and medium for quantifying feature distinguishing capability
Dong et al. Protein remote homology detection based on binary profiles
Hou A new clustering validity index based on K-means algorithm
CN111488903A (en) Decision tree feature selection method based on feature weight
Abdullahi et al. A novel approach for identifying banded patterns in zero-one data using column and row banding scores
Kudo et al. Simple termination conditions for k-nearest neighbor method
Solorio-Fernández et al. Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure
Morid et al. Leveraging Time Series Data in Similarity Based Healthcare Predictive Models: The Case of Early ICU Mortality Prediction.
CN113011476B (en) User behavior safety detection method based on self-adaptive sliding window GAN
CN113344122B (en) Operation flow diagnosis method, device and storage medium
Khaire et al. Optimizing feature selection parameters using statistically equivalent signature (SES) algorithm
CN112509640B (en) Gene ontology item name generation method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant