CN112966017A

CN112966017A - Abnormal subsequence detection method with indefinite length in time sequence

Info

Publication number: CN112966017A
Application number: CN202110226782.0A
Authority: CN
Inventors: 陈逸舟; 张丹; 熊晓菁
Original assignee: Beijing Qingmeng Shuhai Technology Co ltd
Current assignee: Beijing Qingmeng Shuhai Technology Co ltd
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2021-06-15
Anticipated expiration: 2041-03-01
Also published as: CN112966017B

Abstract

The invention provides a method for detecting abnormal subsequences of indefinite length in a time sequence, which adopts the mean value/median of K neighbor distance in the definition of the abnormal subsequences, carries out parallel optimization based on STOMP algorithm in the calculation of the subsequence distance, uses the length range and step length of input subsequences in the algorithm parameter setting, directly outputs the abnormal subsequence detection result under each length in the algorithm output, finds out different abnormal points, and can also output the abnormal degree score and the abnormal subsequence result obtained by judgment based on the detection result and a certain evaluation index. The abnormal subsequence detection method with the indefinite length in the time sequence can obviously improve the operation efficiency and the detection accuracy of the abnormal subsequence detection of the time sequence.

Description

Abnormal subsequence detection method with indefinite length in time sequence

Technical Field

The invention relates to time sequence anomaly detection in the field of data mining algorithms, in particular to an anomaly subsequence detection method with an indefinite length in a time sequence.

Background

Currently, anomaly detection is applied in a large number of actual scenes as a wide technical field, and different types of special technical methods are generally required to be applied to different service fields and data types. Time series anomaly detection, which is the identification of outliers in time-aligned data, can play a role in a very large number of application areas, such as: in a financial market, detecting sudden changes in a stock market, or abnormal patterns within a particular time window; the system operation diagnosis aspect can be used for monitoring the equipment operation condition, detecting abnormal invasion and the like; in the biological field, since the arrangement of amino acids is similar to the characteristics of time series data, a detection method in terms of time series can also be applied. Due to the different data characteristics generated in different fields and the different proposed service requirements, a number of different methods have also been developed in the field of time series anomaly detection: from the aspect of data dimension, the method can be divided into single-sequence and multi-dimension data anomaly detection methods; from the aspect of anomaly definition, the method can be divided into data point anomaly detection and window anomaly detection; from the perspective of algorithm implementation, the algorithm can be divided into supervised and unsupervised anomaly detection algorithms.

Currently, most of time series anomaly detection algorithms which are researched and applied are used for detecting single data point anomalies, namely, the anomaly probability of each data point in a time series is output, and then whether the data point is anomalous or not is judged by setting a threshold value. However, in practical application scenarios, it is often necessary to identify pattern abnormalities that last for a period of time (e.g., pattern abnormalities of an electrocardiogram under arrhythmia symptoms), and it is preferable to use an abnormal subsequence detection algorithm.

Given a time series data of length n, a subsequence of length L and starting position i can be represented. The definition of an aberrant subsequence currently in common use is: the subsequence with the largest nearest neighbor distance in the time series T, i.e. for subsequence D and optionally subsequence C, and corresponding set of non-overlapping subsequences M_DAnd M_CIf so, the subsequence D is an abnormal subsequence in the time series T. The distance between two time sequences is measured by Euclidean distance in the most basic definition, and other reasonable distance measurement can be used in practical application; in addition, the definition of the abnormal subsequence can be easily expanded to output a plurality of abnormal subsequences.

However, detection of abnormal subsequences also faces difficulties in terms of computational inefficiency, parameter dependence, similar abnormality identification, and the like. First, in terms of computational efficiency, according to the definition of abnormal subsequences described above, the most straightforward implementation is to extract abnormal subsequences from every two subsequences by circularly calculating the distance between the subsequences, and the time complexity of such an algorithm is (where n is the time sequence length and m is the subsequence length), and the time sequence data is often large in length, so that a brute force-based algorithm is almost impossible to implement. In recent years, many studies have been made to solve this problem, and various algorithms for improving the efficiency have been proposed by performing dimensionality reduction representation and pre-sorting on the time series, or by setting a distance threshold value, and the like. The former is mostly a heuristic algorithm, the actual calculation efficiency is strongly associated with the setting of a plurality of parameters and the actual data characteristic condition, and when the parameters are improperly set or the data distribution does not meet the expectation, the efficiency of the algorithm may be reduced; the latter prunes the distance calculation process by setting a threshold, and also has strong dependency on the setting of the threshold, and an improper setting of the threshold may cause the algorithm to fail (fail to return any abnormal subsequence) or the efficiency to be reduced, and the threshold is difficult to estimate through experience in advance. Yeh et al (2018) propose a breakthrough algorithm STOMP, which is characterized in that the calculation efficiency of the distance between every two subsequences is greatly optimized by a fast Fourier transform and moving dot product method, and the calculation process does not depend on the setting of other parameters or the distribution characteristics of data, so that the distance calculation on a large-scale data set becomes feasible and predictable.

Another problem with abnormal subsequence detection is parameter dependence, and the algorithm only needs to input one parameter according to the abnormal subsequence definition described above: the length of the target subsequence. However, the setting of this parameter has a decisive influence on the detection result, and different target subsequence lengths may result in completely different detection results, rather than only slightly affecting the detection accuracy. There has been little interest in this problem in previous studies, which is generally regarded as a necessary parameter tuning process, but finding a suitable target subsequence length is not an easy task to implement in view of the time cost of abnormal subsequence detection. Therefore, it is desirable to eliminate the dependency on this parameter as much as possible in the invention, and to obtain more stable and efficient output results.

There is also the problem of similar anomalies, defined as the subsequence with the largest nearest neighbor distance in the time series T according to the above. In practical applications, we find that there may be two (and more) abnormal subsequences with similar morphology in the data, and since the nearest neighbor distance is used as a measure in the definition, the two similar abnormal subsequences may have a very small nearest neighbor distance, which causes failure of the detection algorithm. It is also desirable to optimize the distance index in the invention to cope with similar anomalies and similar data problems.

Disclosure of Invention

The invention provides a method for detecting abnormal subsequences with indefinite length in a time sequence, which solves the problems in the prior art and is improved from the following aspects:

(1) in the aspect of definition of an abnormal subsequence, expanding the traditional nearest neighbor distance into a mean value/median of a K nearest neighbor distance so as to solve the problem that similar abnormality possibly exists in data;

(2) in the aspect of subsequence distance calculation, parallel optimization is performed based on the STOMP algorithm, and the algorithm operating efficiency is further improved;

(3) in the aspect of algorithm parameter setting, the length range and the step length of an input subsequence are used for replacing the length of a single target subsequence, and the algorithm calculates the detection of an abnormal subsequence at intervals of a certain step length in the input length range;

(4) in the aspect of algorithm output, abnormal subsequence detection results under various lengths can be directly output, different abnormal points can be found, and abnormal degree scores and abnormal subsequence results obtained through judgment can also be output based on the detection results and certain evaluation indexes (such as subsequence reproduction times).

The technical scheme is as follows:

a method for detecting abnormal subsequences with indefinite length in a time sequence comprises the following steps:

s1: inputting time sequence data T, a minimum subsequence length min _ len, a maximum subsequence length max _ len and a step size step; optionally inputting the number k of neighbors, a k-neighbor distance integration method, detecting the number n _ disorders of abnormal subsequences and the number n _ works of parallel processes;

s2: determining a target subsequence length sub _ len set according to the set minimum subsequence length min _ len, the set maximum subsequence length max _ len and the step size step, and executing the following cycle for each target subsequence length sub _ len in the set:

a) dividing the time sequence T into a plurality of time sequence subsets T according to the parallel process number n _ works_worker；

b) In each process, the STOMP algorithm is applied to calculate the time sequence T for each sub-sequence in the time sequence T_workerLocal neighbor matrix mp_worker；

c) The calculation results mp of the respective processes_workerIntegrating, reserving k nearest neighbor distances for each subsequence in the time sequence T, and counting the number of bits or the mean value of the k nearest neighbor distances to form a k nearest neighbor distance matrix mp of the time sequence T on the length sub _ len of the subsequence_{sub_len}；

S3: k neighbor distance matrix mp of length sub _ len from each target subsequence_{sub_len}In the method, n _ discords abnormal subsequences with the maximum k neighbor distance are obtained through calculation;

s4: for each data point T in the time series T_iAnd calculating the occurrence frequency of the abnormal subsequences with various lengths, and marking the points with the occurrence frequency exceeding a certain threshold value as the finally detected abnormal values.

Further, in step S1, k is defaulted to 1, k is defaulted to the median by the k-nearest neighbor distance integration method, the number n _ records of detected abnormal subsequences is defaulted to 3, and the number n _ works of the runs is defaulted to 4.

Further, in step S1, the time-series data T should include a time-series and a value-series, which indicate the corresponding values of the time-series at each time point, and the time points are preferably equally spaced; the minimum subsequence length min _ len, the maximum subsequence length max _ len and the step size step are used to determine the length range of the subsequences, i.e. within the range of greater than or equal to min _ len and less than or equal to max _ len, a value is taken every step as the target subsequence length, and for the target subsequence length set obtained here, the loop calculations of S2a-S2c are performed in sequence.

Further, in step S2, the abnormality detection is performed for the first time by using the nearest neighbor distance where k is 1, and if a similar abnormality pattern that cannot be identified is found, the value of k is increased, and an abnormality is detected by the k-neighbor distance.

In step S2, the time sequence T is divided into several time sequence subsets T_workerIn each parallel process, time series data T and data subset T are input_workerThe length sub _ len of the target subsequence and the number k of adjacent neighbors, and initializing a k adjacent neighbor distance matrix _ profile_workerThe number of rows is n-sub _ len +1, the number of columns is k, and the initial values are all positive infinity.

In step S2, for data subset T_workerEach sub-sequence T in_{i,sub_len}Calculating the subsequence T by means of fast Fourier transform and moving dot product by applying STOMP algorithm_{i,sub_len}The distance from each subsequence in the time sequence data T is obtained to obtain a distance vector dist with the length of n-sub _ len +1_{i,sub_len}。

In step S2, if k is 1, that is, only the nearest neighbor distance is calculated, the distance vector dist is calculated every time_{i,sub_len}With local neighbor matrix mp_workerComparing the elements of the corresponding positions, and reserving a smaller distance value at each position; if k > 1, k neighbor distances need to be preserved, and distance vector dist is carried out each time_{i,sub_len}With local neighbor matrix mp_workerAfter merging, the minimum k values at each position are retained.

In step S2, each data subset T is divided into_workerLocal neighbor matrix mp obtained by the above calculation_workerMerging, retaining the minimum k values at each position, and calculating the mean value orAfter the median, a k neighbor distance matrix mp of the time sequence data T on the subsequence length sub _ len is obtained_{sub_len}And (6) merging.

Further, in step S3, for each target subsequence length sub _ len, a k-nearest neighbor distance matrix mp is calculated_{sub_len}After the sequences are arranged in a descending order, 1 subsequence with the maximum adjacent distance is selected as an abnormal subsequence result; if n _ records is more than 1, checking backwards one by one, if the difference value between the position i of the subsequence and the position of the existing abnormal subsequence is less than sub _ len, namely if the checked subsequence overlaps with the existing abnormal subsequence, skipping the current subsequence, and continuing checking backwards until the number of the abnormal subsequences reaches n _ records, so as to obtain an abnormal subsequence set records with the target subsequence length sub _ len_{sub_len}。

Further, in step S4, after obtaining the abnormal subsequence set at each target subsequence length, the final abnormal subsequence result is obtained by establishing an evaluation index.

The abnormal subsequence detection method with the indefinite length in the time sequence can obviously improve the operation efficiency and the detection accuracy of the abnormal subsequence detection of the time sequence. In the aspect of operation efficiency, original time sequence data are divided into a plurality of subsets, a plurality of processes are started to calculate k neighbor distance matrixes of the subsets in parallel, and the k neighbor distance matrixes of the original time sequence data are obtained through combination calculation. In the aspect of detection accuracy, the method uses the k-nearest neighbor distance instead of the nearest neighbor distance for detecting the abnormal subsequence, and uses the subsequence length range and the step length parameter instead of the commonly used fixed subsequence length, so that multiple types of abnormal patterns in data can be detected more accurately in practical application.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting abnormal subsequences of indefinite length in the time sequence;

FIG. 2 is a schematic diagram of the results of abnormal subsequence detection for New York City taxi passenger data in an embodiment;

fig. 3 is a schematic effect diagram of the abnormal subsequence detection method with an indefinite length in the time series.

Detailed Description

As shown in fig. 1, the method for detecting abnormal subsequences with indefinite length in the time sequence comprises the following steps:

s1: inputting time sequence data T, a minimum subsequence length min _ len, a maximum subsequence length max _ len and a step size step; optionally, a number k of neighbors (default to 1), a k-neighbor distance integration method (default to median), a number n _ disorders of abnormal subsequences (default to 3), and a number n _ workers of strokes (default to 4) are input.

S2: determining a target subsequence length sub _ len set according to the set minimum subsequence length, maximum subsequence length and step length, and executing the following cycle for each target subsequence length sub _ len in the set:

a) dividing the time sequence T into a plurality of time sequence subsets T according to the parallel process number n _ works_worker；(4.1)

b) In each process, the STOMP algorithm is applied to calculate the time sequence T for each sub-sequence in the time sequence T_workerLocal neighbor matrix mp_worker；(4.2，4.3，4,4)

c) The calculation results mp of the respective processes_workerIntegrating, reserving k nearest neighbor distances for each subsequence in the time sequence T, and counting the number of bits (or average value) of the k nearest neighbor distances to form a k nearest neighbor distance matrix mp of the time sequence T on a subsequence length sub _ len_{sub_len}；(4.5)

S3: k neighbor distance matrix mp of length sub _ len from each target subsequence_{sub_len}In the method, n _ discords abnormal subsequences with the largest k neighbor distance are obtained through calculation.

In step S1, the time-series data T should include a time column and a value column, which indicate the corresponding values of the time series at each time point, and the time points are preferably equally spaced; the minimum subsequence length min _ len, the maximum subsequence length max _ len and the step size step are used to determine the length range of the subsequences, i.e. within the range of greater than or equal to min _ len and less than or equal to max _ len, a value is taken every step as the target subsequence length, and for the target subsequence length set obtained here, the loop calculations of S2a-S2c are performed in sequence.

The following takes a time series data set of the number of passengers of a taxi in new york as an example to describe in detail the embodiment of the invention.

1. Data is acquired. The new york taxi passenger data set comprises two columns of time stamp (timestamp) and passenger number (value), the time span is from 7 months 1 days in 2014 to 1 month 31 days in 2015, and each data interval is 30 minutes and has 10320 data.

2. And (5) determining parameters. The most important input parameters in the algorithm of the invention are the length ranges of the subsequences, namely the minimum and maximum subsequence lengths and step length. Because the length of the subsequence in the abnormal subsequence detection algorithm can seriously influence the detection result, the invention uses the length range (the minimum subsequence length, the maximum subsequence length and the step length) of the subsequence to replace the length of a single subsequence, can obviously improve the stability of the detection result of the algorithm, but still needs a user to input a reasonable length range of the subsequence. Empirically, suitable values for the minimum and maximum subsequence lengths can be approximated by the following rules: if the actual anomaly occurs over a time span of L (e.g., 10 data points), then the actual anomalous subsequence can be better detected when the subsequence has a length approximately in the range of 1.5L to 3L (i.e., 15-30 data points). In practical application, the time span of the actual abnormality occurrence may be roughly estimated according to data characteristics or background experience, and then the input subsequence length range parameter may be determined by combining the above experience.

3. The embodiment corresponds to the following. In this example, the shortest anomalies found from the observation of the data may occur within 1-2 hours (2-4 data points), while the longest anomalies may last 1-2 days (48-96 data points), so here the minimum subsequence length is chosen to be 8(4 hours), the maximum subsequence length is chosen to be 240(5 days), and the step size is 8. The significance of the step size is to reduce the repeated calculation under the adjacent subsequence length, if the step size parameter is not set (i.e. the step size defaults to 1), the algorithm needs to perform the cyclic calculation on each target length value (in this example, 8,9,10, …,239,240, 233 target lengths in total) within the minimum to maximum subsequence length, which will occupy a large amount of calculation time, and the results obtained by the algorithm detection under the adjacent target subsequence length (for example, the lengths are 8 and 9, respectively) are very close, and the significance of the repeated calculation is not large, so that the calculation efficiency can be greatly improved by setting the step size parameter without affecting the final detection result, in this example, setting the step size to 8,16,24, …,232, and 240 need only perform the cyclic calculation for 30 target subsequence lengths sub _ len, which are 8,16,24, …,232, and 240.

4. And circularly calculating a k neighbor distance matrix under the length of each target subsequence. In practical applications, it is necessary to input the number k of neighbors and a summary method (mean/median) of k-neighbor distances, and generally, when abnormality detection is performed for the first time, detection can be performed using k equal to 1, that is, nearest neighbor distances. In this example, first, an attempt is made to detect an abnormality with k equal to 1. After the input of the neighbor number k is determined, starting from the input minimum subsequence length, a k neighbor distance matrix at the subsequence length is calculated, and then the subsequence length is added with a step size, and the process is circulated until the maximum subsequence length is exceeded. In each cycle, the method for calculating the k neighbor distance matrix is as follows:

4.1. partitioning raw time series data into subsets T_worker. The subset dividing process is mainly used for supporting subsequent parallel computing tasks, and the specific dividing mode and the dividing result of the subset dividing process do not influence the final abnormal detection result and only influence the operation efficiency of the algorithm to a certain extent. In this example, the subset size is selected to be 200, 51 subsets are divided in total, and 4 processes are allocated for parallel computing. In other cases, the subset partitioning may be based on actual data size and computational resource conditionsThis is not a limitation in the present invention.

4.2. In each parallel process, time-series data T (data amount n) and a data subset T are input_workerTarget subsequence length sub _ len and neighbor number k, initializing local neighbor matrix mp_workerThe number of rows is n-sub _ len +1, the number of columns is k, and the initial values are all positive and infinite;

4.3. for data subset T_workerEach sub-sequence T in_{i,sub_len}Calculating the subsequence T by means of fast Fourier transform and moving dot product by applying STOMP algorithm_{i,sub_len}The distance from each subsequence in the time sequence data T is obtained to obtain a distance vector dist with the length of n-sub _ len +1_{i,sub_len}；

4.4. If k is 1, i.e. only the nearest neighbor distance is calculated, every time the distance vector dist is calculated_{i,sub_len}With local neighbor matrix mp_workerComparing the elements of the corresponding positions, and reserving a smaller distance value at each position; if k > 1, k neighbor distances need to be preserved, and distance vector dist is carried out each time_{i,sub_len}With local neighbor matrix mp_workerAfter merging, keeping the minimum k values at each position;

4.5. each data subset T_workerLocal neighbor matrix mp obtained by the above calculation_workerMerging, reserving the minimum k values at each position, and calculating a mean value or a median according to an input summarizing mode to obtain a k neighbor distance matrix mp of the time series data T on the subsequence length sub _ len_{sub_len}And (6) merging.

5. K neighbor distance matrix mp of length sub _ len from each target subsequence_{sub_len}Detecting to obtain corresponding abnormal subsequence. In practical applications, the number n _ disorders of detected abnormal subsequences needs to be input, and this parameter depends on the estimation of the number of times of abnormal occurrence in the actual data, which is not limited by the present invention, and in this example, the default value 3 is selected. K nearest neighbor distance matrix mp calculated under length sub _ len of each target subsequence_{sub_len}After the sequences are arranged in a descending order, 1 subsequence with the maximum adjacent distance is selected as an abnormal subsequence result; if n _ discs is more than 1, checking backwards one by one, if the difference value between the position i of the subsequence and the position of the existing abnormal subsequence is less than sub _ len, namely if the checked subsequence overlaps with the existing abnormal subsequence, skipping the current subsequence, and continuing checking backwards until the number of the abnormal subsequences reaches n _ discords to obtain an abnormal subsequence set discords under the length sub _ len of the target subsequence_{sub_len}。

6. And judging the final abnormal subsequence result by establishing an evaluation index. In the last step, the abnormal subsequence set under each target subsequence length is obtained through calculation, and on the basis, a user can establish a certain evaluation index so as to judge and obtain a final abnormal subsequence result, wherein the used evaluation index is not fixed. In this example, the number of reproductions is used as an evaluation index, i.e. for each data point T in the time series T_iCounting the abnormal subsequences in the abnormal subsequences_{sub_len}If the number of occurrences is equal to or greater than 5, the number is determined as a final abnormal result and output.

7. And (6) evaluating the results. Through the implementation steps, the result of detecting the abnormal subsequence for the passenger data of the taxi in new york city is shown in fig. 2. The upper part is the original data and the abnormal point marks in the original data, and the lower part is an abnormal subsequence set detected by the algorithm under each subsequence length. It can be seen that when the evaluation index with the reproduction times of more than or equal to 5 is used, the algorithm very accurately identifies 5 anomalies in the data, the data respectively correspond to winter season, thanksgiving festival, christmas, denier and one-time snowstorm weather, no false positive case is generated, and the anomaly detection accuracy is very high.

Through the embodiments, the invention can obviously improve the operation efficiency and the detection accuracy of the time series abnormal subsequence detection.

In the aspect of operation efficiency, original time sequence data are divided into a plurality of subsets, a plurality of processes are started to calculate k neighbor distance matrixes of the subsets in parallel, and the k neighbor distance matrixes of the original time sequence data are obtained through combination calculation. The KPI time sequence data with the length of 20000 are used for testing, the length of the selected subsequence is 720, the number of the neighbors is 1, the time consumed for calculating the k-neighbor distance matrix by using the original STOMP algorithm is about 85 seconds, the time consumed for parallel use of 2 processes is about 36 seconds, the time consumed for parallel use of 4 processes is about 25 seconds, and the operation efficiency of the algorithm is obviously improved.

In the aspect of detection accuracy, the method uses the k-nearest neighbor distance instead of the nearest neighbor distance for detecting the abnormal subsequence, and uses the subsequence length range and the step length parameter instead of the commonly used fixed subsequence length, so that multiple types of abnormal patterns in data can be detected more accurately in practical application. Similarly, the KPI time series data with the length of 20000 is used for testing, the length of the minimum subsequence is 180, the length of the maximum subsequence is 1440, the step size is 30, and the result of performing anomaly detection on the data is shown in fig. 3. The upper part of fig. 3 is labeled with original data and actual abnormal points, and the lower part is the position of the abnormal subsequence detected under each subsequence length. With more than 10 times of recurrences as the standard for selecting the final abnormal result, the algorithm correctly identifies 4 main abnormalities in the data, only generates 1 false positive detection result, and obviously improves the accuracy of abnormal detection.

Claims

1. A method for detecting abnormal subsequences with indefinite length in a time sequence comprises the following steps:

2. The method for detecting abnormal subsequences with indefinite length in time sequence according to claim 1, wherein: in step S1, k is defaulted to 1, k is defaulted to a median by the k neighbor distance integration method, the number n _ records of the detected abnormal subsequences is defaulted to 3, and the number n _ works is defaulted to 4.

3. The method for detecting abnormal subsequences with indefinite length in time sequence according to claim 1, wherein: in step S1, the time series data T should include a time series and a value series, which represent the corresponding values of the series at each time point, and the time points are preferably equally spaced; the minimum subsequence length min _ len, the maximum subsequence length max _ len and the step size step are used to determine the length range of the subsequences, i.e. within the range of greater than or equal to min _ len and less than or equal to max _ len, a value is taken every step as the target subsequence length, and for the target subsequence length set obtained here, the loop calculations of S2a-S2c are performed in sequence.

4. The method for detecting abnormal subsequences with indefinite length in time sequence according to claim 1, wherein: in step S2, the abnormality detection is performed for the first time by using the nearest neighbor distance where k is 1, and if an unrecognizable similar abnormality pattern is found in the result, the value of k is increased, and the abnormality is detected by the k-neighbor distance.

5. The method according to claim 3, wherein the method comprises the following steps: in step S2, the time sequence T is divided into several time sequence subsets T_workerIn each parallel process, time series data T and data subset T are input_workerThe length sub _ len of the target subsequence and the number k of adjacent neighbors, and initializing a k adjacent neighbor distance matrix _ profile_workerThe number of rows is n-sub _ len +1, the number of columns is k, and the initial values are all positive infinity.

6. The method according to claim 5, wherein the method comprises the following steps: in step S2, for data subset T_workerEach sub-sequence T in_{i,sub_len}Calculating the subsequence T by means of fast Fourier transform and moving dot product by applying STOMP algorithm_{i,sub_len}The distance from each subsequence in the time sequence data T is obtained to obtain a distance vector dist with the length of n-sub _ len +1_{i,sub_len}。

7. The method according to claim 6, wherein the method comprises the following steps: in step S2, if k is 1, that is, only the nearest neighbor distance is calculated, the distance vector dist is calculated every time_{i,sub_len}With local neighbor matrix mp_workerComparing the elements of the corresponding positions, and reserving a smaller distance value at each position; if k > 1, k neighbor distances need to be preserved, and distance vector dist is carried out each time_{i,sub_len}With local neighbor matrix mp_workerAfter merging, the minimum k values at each position are retained.

8. According to claimThe method for detecting abnormal subsequences with indefinite length in time sequence according to claim 7, wherein: in step S2, each data subset T is divided into_workerLocal neighbor matrix mp obtained by the above calculation_workerMerging, reserving the minimum k values at each position, and calculating a mean value or a median according to an input summarizing mode to obtain a k neighbor distance matrix mp of the time series data T on the subsequence length sub _ len_{sub_len}And (6) merging.

9. The method according to claim 8, wherein the method comprises the following steps: in step S3, k neighbor distance matrix mp calculated for each target subsequence length sub _ len_{sub_len}After the sequences are arranged in a descending order, 1 subsequence with the maximum adjacent distance is selected as an abnormal subsequence result; if n _ records is more than 1, checking backwards one by one, if the difference value between the position i of the subsequence and the position of the existing abnormal subsequence is less than sub _ len, namely if the checked subsequence overlaps with the existing abnormal subsequence, skipping the current subsequence, and continuing checking backwards until the number of the abnormal subsequences reaches n _ records, so as to obtain an abnormal subsequence set records with the target subsequence length sub _ len_{sub_len}。

10. The method according to claim 9, wherein the method comprises the following steps: in step S4, after obtaining the abnormal subsequence set at each target subsequence length, the final abnormal subsequence result is obtained by establishing evaluation index.