CN110071913A - A kind of time series method for detecting abnormality based on unsupervised learning - Google Patents

A kind of time series method for detecting abnormality based on unsupervised learning Download PDF

Info

Publication number
CN110071913A
CN110071913A CN201910234623.8A CN201910234623A CN110071913A CN 110071913 A CN110071913 A CN 110071913A CN 201910234623 A CN201910234623 A CN 201910234623A CN 110071913 A CN110071913 A CN 110071913A
Authority
CN
China
Prior art keywords
time series
data
unsupervised learning
anomaly detection
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910234623.8A
Other languages
Chinese (zh)
Other versions
CN110071913B (en
Inventor
杨恺
刘音希
窦绍瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201910234623.8A priority Critical patent/CN110071913B/en
Publication of CN110071913A publication Critical patent/CN110071913A/en
Application granted granted Critical
Publication of CN110071913B publication Critical patent/CN110071913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The present invention relates to a kind of time series method for detecting abnormality based on unsupervised learning, comprising: time series data is subjected to cutting in the position of its significant changes, and setting length is padded to the data segment after each cutting;Multiple data segments training one using the time series cutting under normal condition and after filling up is used for the neural network of abnormality detection;Multiple data segments by time series cutting to be detected and after filling up are detected as the input of abnormality detection model, and output abnormality score;Judge whether abnormal score is more than threshold value, if it has, then judgement is abnormal, conversely, then judging no exceptions.Compared with prior art, the present invention has the advantages that not depend on that markd abnormal data, not lose data information, performance excellent etc..

Description

一种基于无监督学习的时间序列异常检测方法A Time Series Anomaly Detection Method Based on Unsupervised Learning

技术领域technical field

本发明涉及一种异常检测方法,尤其是涉及一种基于无监督学习的时间序列异常检测方法。The invention relates to an anomaly detection method, in particular to a time series anomaly detection method based on unsupervised learning.

背景技术Background technique

异常检测(Anomaly Detection)是一种检测数据中的异常的手段,其中“异常”是指不符合正常行为的模式,例如在网络流量分析领域,正常模式是指正常的网络访问行为,异常模式是指网络入侵者的行为。异常检测被应用于很多领域,如医疗健康领域、网络安全领域、金融安全领域、系统维护领域等等。Anomaly Detection is a means of detecting anomalies in data, where "abnormal" refers to patterns that do not conform to normal behaviors. For example, in the field of network traffic analysis, normal patterns refer to normal network access behaviors, and abnormal patterns are Refers to the behavior of network intruders. Anomaly detection is used in many fields, such as medical and health field, network security field, financial security field, system maintenance field and so on.

时间序列(Time Series)是指一系列形如<时间戳,数据>形式的数据,时间序列常常用于实时记录系统运行状态、人体健康数据等数据,通过分析时间序列数据,可以判断系统所处的状态,并分析系统行为,辅助人类进行决策。在现实生活中,很多系统都使用时间序列数据记录系统运行状态,如网站系统访问量、服务器CPU运行状态。此外,在医疗健康领域,心电图数据、疾病发展变化数据等也都适用时间序列来表示。Time series refers to a series of data in the form of <timestamp, data>. Time series is often used to record data such as system operating status and human health data in real time. By analyzing time series data, it is possible to determine where the system is located. state, and analyze system behavior to assist humans in decision-making. In real life, many systems use time series data to record system operating status, such as website system traffic and server CPU operating status. In addition, in the medical and health field, electrocardiogram data, disease development and change data, etc. are also represented by time series.

时间序列中的异常往往可以反映出系统的异常,例如在网站系统中,数据库阻塞或死锁均会反映在数据库的监测数据上,在心电图数据中,心脏疾病所导致的异常也会反映在心电图数据中。因此,针对时间序列数据的异常检测有助于人们尽早发现异常,并采取适当措施避免异常。Abnormalities in the time series can often reflect the abnormality of the system. For example, in the website system, database blockage or deadlock will be reflected in the monitoring data of the database. In the ECG data, the abnormality caused by heart disease will also be reflected in the ECG. in the data. Therefore, anomaly detection for time series data helps people to detect anomalies as early as possible and take appropriate measures to avoid them.

目前,异常检测主要分为有监督方法和无监督方法两种,其中有监督的方法需要大量的带有异常标记的数据进行模型训练,然而异常往往是偶发的,所以在现实生活中很难得到大量的异常数据。因此,我们考虑使用无监督的方法实现异常检测。At present, anomaly detection is mainly divided into two types: supervised methods and unsupervised methods. The supervised method requires a large amount of data marked with anomalies for model training. However, anomalies are often accidental, so it is difficult to obtain in real life. A lot of abnormal data. Therefore, we consider using unsupervised methods to achieve anomaly detection.

发明内容SUMMARY OF THE INVENTION

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于无监督学习的时间序列异常检测方法。The purpose of the present invention is to provide a time series anomaly detection method based on unsupervised learning in order to overcome the above-mentioned defects of the prior art.

本发明的目的可以通过以下技术方案来实现:The object of the present invention can be realized through the following technical solutions:

一种基于无监督学习的时间序列异常检测方法,包括:An unsupervised learning-based time series anomaly detection method, including:

将时间序列数据在其显著变化的位置进行切分,并对每一个切分后的数据段填补至设定长度;Divide the time series data at its significantly changed position, and fill each segmented data segment to a set length;

使用正常状态下的时间序列切分并填补后的多个数据段作为输入训练异常检测模型;The anomaly detection model is trained using multiple data segments that are segmented and filled in the normal state of the time series as input;

将由待检测时间序列切分并填补后的多个数据段作为异常检测模型的输入进行检测,并输出异常得分;The multiple data segments that are divided and filled by the time series to be detected are used as the input of the anomaly detection model to detect, and the anomaly score is output;

判断异常得分是否超过阈值,若为是,则判断发生异常,反之,则判断未发生异常。It is judged whether the abnormality score exceeds the threshold value, if yes, it is judged that an abnormality has occurred, otherwise, it is judged that no abnormality has occurred.

所述将时间序列数据在其显著变化的位置进行切分,具体包括:The process of segmenting the time series data at its significantly changed positions specifically includes:

求时间序列数据的所有极值点;Find all extreme points of time series data;

然后将绝对值较大的极值点位置作为切分点,切分时间序列为多个数据段,其中,切分点由人工设定的数据极值点绝对值阈值决定。Then, the position of the extreme value point with the larger absolute value is used as the split point, and the time series is split into multiple data segments, wherein the split point is determined by the manually set absolute value threshold of the extreme value point of the data.

所述异常检测模型包括数据压缩器和高斯混合模型估计器,所述数据压缩器采用多对多的LSTM网络结构,所述高斯混合模型估计器采用多层感知器结构。The anomaly detection model includes a data compressor and a Gaussian mixture model estimator. The data compressor adopts a many-to-many LSTM network structure, and the Gaussian mixture model estimator adopts a multi-layer perceptron structure.

所述数据压缩器的压缩过程包括:The compression process of the data compressor includes:

将数据段进行压缩重建;Compress and reconstruct the data segment;

计算压缩前后的相对距离和余弦距离;Calculate the relative distance and cosine distance before and after compression;

将相对距离、余弦距离,以及LSTM网络隐藏层单元的输出合成为高斯混合模型估计器的输入量。The relative distance, cosine distance, and the output of the hidden layer unit of the LSTM network are synthesized as the input of the Gaussian mixture model estimator.

所述相对距离的数学表达式为:The mathematical expression of the relative distance is:

其中:r为相对距离,L为对数据段中包含的时间序列的长度,xi为数据段中包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。Among them: r is the relative distance, L is the length of the time series contained in the data segment, x i is the element in the time series contained in the data segment, and x' is the element in the time series obtained after recombination.

所述余弦距离的数学表达式为:The mathematical expression of the cosine distance is:

其中:c为余弦距离,||·||为范数,xi为数据段中包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。Where: c is the cosine distance, ||·|| is the norm, x i is the element in the time series contained in the data segment, and x′ is the element in the time series obtained after recombination.

所述高斯混合模型估计器的训练过程包括:The training process of the Gaussian mixture model estimator includes:

接收数据压缩器的输出并映射为K维向量,其中,K为模型中高斯分布的数目,The output of the data compressor is received and mapped to a K-dimensional vector, where K is the number of Gaussian distributions in the model,

基于K维向量的各元素得到各高斯分布的混合概率、均值和协方差;Based on each element of the K-dimensional vector, the mixture probability, mean and covariance of each Gaussian distribution are obtained;

所述高斯混合模型的检测过程包括:The detection process of the Gaussian mixture model includes:

接收数据压缩器的输出并计算得到异常得分。The output of the data compressor is received and an anomaly score is calculated.

所述异常得分的数学表达式为:The mathematical expression of the abnormal score is:

其中:Score(z)为异常得分,为第k个高斯分布的混合概率,为第k个高斯分布的协方差,z为数据压缩器的输出,为第k个高斯分布的均值,的逆矩阵。Among them: Score(z) is the abnormal score, is the mixture probability of the kth Gaussian distribution, is the covariance of the kth Gaussian distribution, z is the output of the data compressor, is the mean of the kth Gaussian distribution, for The inverse matrix of .

所述第k个高斯分布的混合概率为:The mixture probability of the k-th Gaussian distribution is:

所述第k个高斯分布的均值为:The mean of the k-th Gaussian distribution is:

所述第k个高斯分布的协方差为:The covariance of the kth Gaussian distribution is:

其中:N为训练样本的总数,为第i个训练样本的第k维数据,zi为第i个训练样本。Where: N is the total number of training samples, is the k-th dimension data of the ith training sample, and zi is the ith training sample.

所述数据压缩器与高斯混合模型估计器使用端到端的方式进行训练,训练的目标函数如下:The data compressor and the Gaussian mixture model estimator are trained in an end-to-end manner, and the training objective function is as follows:

其中:J为目标函数,λ1、λ2为人工设定的参数,xi为第i个数据段包含的时间序列,x′为由第i个数据段包含的时间序列充足后的时间序列,为惩罚项。Among them: J is the objective function, λ 1 and λ 2 are manually set parameters, x i is the time series included in the i-th data segment, and x′ is the time series after the time series included in the i-th data segment is sufficient , for punishment.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

1)在模型训练与异常检测之前,将时间序列数据在其显著变化的位置进行切分,切分后的序列数据用于进行模型训练。常规的异常检测方法使用固定长度的时间窗口滑动选取时间片,导致分割后的序列数据产生大量的冗余信息,不利于神经网络的特征学习,另一方面,使用固定长度的时间序列无法不利于表征时间窗口内的数据有固定含义,无法实现对于具有相似物理含义的时间序列的比较。1) Before model training and anomaly detection, the time series data is segmented at its significantly changed position, and the segmented sequence data is used for model training. Conventional anomaly detection methods use a fixed-length time window to slide and select time slices, resulting in a large amount of redundant information generated in the segmented sequence data, which is not conducive to the feature learning of neural networks. On the other hand, the use of fixed-length time series cannot be detrimental to The data in the characterization time window has a fixed meaning, and it is impossible to compare time series with similar physical meanings.

2)采用基于密度估计的方法,将分割后的训练样本视为采样自未知高斯混合分布的样本,并利用神经网络估计未知分布的高斯混合模型,常规的方法中仅考虑了整条数据的概率分布,而未考虑数据每段不同的特征分布。2) Using the method based on density estimation, the divided training samples are regarded as samples sampled from an unknown Gaussian mixture distribution, and a neural network is used to estimate the Gaussian mixture model of the unknown distribution. In the conventional method, only the probability of the entire data is considered. distribution without considering the different feature distributions of each segment of the data.

3)在训练阶段,切分后的数据被送入一个多对多的循环神经网络中,用于重建训练样本,神经网络隐含层最后一个步长的输出、重建序列与原始序列之间的相对距离与余弦距离被同时送入一个用于估计高斯混合模型参数的神经网络中,常规方法仅使用重建误差作为高斯混合模型的估计依据。3) In the training phase, the segmented data is sent to a many-to-many recurrent neural network for reconstructing the training samples, the output of the last step of the hidden layer of the neural network, and the relationship between the reconstructed sequence and the original sequence. The relative distance and the cosine distance are simultaneously fed into a neural network for estimating the parameters of the Gaussian mixture model, and conventional methods only use the reconstruction error as the basis for estimating the Gaussian mixture model.

附图说明Description of drawings

图1为本发明主要步骤流程示意图;1 is a schematic flow chart of the main steps of the present invention;

图2为模型训练流程图;Fig. 2 is the model training flow chart;

图3为本发明所使用的神经网络模型结构示意图;Fig. 3 is the neural network model structure schematic diagram used in the present invention;

图4为异常预测流程图;Fig. 4 is the abnormal prediction flow chart;

图5为本发明方法的性能与现有方法的性能比较示意图。FIG. 5 is a schematic diagram showing the performance comparison between the method of the present invention and the performance of the existing method.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following embodiments.

一种基于无监督学习的时间序列异常检测方法,主要包括两个步骤:模型训练与异常检测,如图1所示,包括:A time series anomaly detection method based on unsupervised learning mainly includes two steps: model training and anomaly detection, as shown in Figure 1, including:

将时间序列数据在其显著变化的位置进行切分,并对每一个切分后的数据段填补至设定长度;为实现以上要求,本发明模型训练步骤的流程图如图2所示。其中数据预处理包括两个步骤:The time series data is segmented at its significantly changed position, and each segmented data segment is filled to a set length; in order to achieve the above requirements, the flow chart of the model training steps of the present invention is shown in FIG. 2 . The data preprocessing includes two steps:

数据切分:首先求序列的所有极值点,然后将绝对值较大的极值点位置作为切分点,切分时间序列为多个数据段,其中,切分点由人工设定的数据极值点绝对值阈值决定。Data segmentation: First find all the extreme points of the sequence, and then use the extreme point position with a larger absolute value as the segmentation point, and segment the time series into multiple data segments, where the segmentation points are manually set data The absolute value threshold of the extreme point is determined.

数据填补:将切分好的多个序列使用0填补至异常检测模型的输入长度。Data padding: Use 0 to pad multiple split sequences to the input length of the anomaly detection model.

切分并填补后的多个数据段分别作为独立的样本用于训练后续的模型。The divided and filled data segments are used as independent samples for training subsequent models.

使用正常状态下的时间序列切分并填补后的多个数据段训练一个用于异常检测的神经网络;Train a neural network for anomaly detection using multiple segments of time series sliced and filled in the normal state;

将由待检测时间序列切分并填补后的多个数据段作为异常检测模型的输入进行检测,并输出异常得分;The multiple data segments that are divided and filled by the time series to be detected are used as the input of the anomaly detection model to detect, and the anomaly score is output;

判断异常得分是否超过阈值,若为是,则判断发生异常,反之,则判断未发生异常。It is judged whether the abnormality score exceeds the threshold value, if yes, it is judged that an abnormality has occurred, otherwise, it is judged that no abnormality has occurred.

如图3所示,异常检测模型包括数据压缩器和高斯混合模型估计器,数据压缩器采用多对多的LSTM网络结构。As shown in Figure 3, the anomaly detection model includes a data compressor and a Gaussian mixture model estimator, and the data compressor adopts a many-to-many LSTM network structure.

数据压缩器中模型结构中所使用的时间步长大于所有的可能的送入样本长度。输入LSTM模型中的时间序列样本记为x=[x1,x2,…,xL],LSTM网络重建后的时间序列记为x′=[x′1,x′2,…,x′L],其中L为时间序列的长度,则LSTM网络的训练的损失函数如下:The time step size used in the model structure in the data compressor is larger than all possible input sample lengths. The time series samples in the input LSTM model are denoted as x=[x 1 ,x 2 ,…,x L ], and the time series reconstructed by the LSTM network are denoted as x′=[x′ 1 ,x′ 2 ,…,x′ L ], where L is the length of the time series, then the loss function for the training of the LSTM network is as follows:

其中xi为一个时间序列样本中的第i个元素,x′i为重建时间序列样本中的第i个元素,L为时间序列的长度。where x i is the ith element in a time series sample, x′ i is the ith element in the reconstructed time series sample, and L is the length of the time series.

其压缩过程包括:Its compression process includes:

将数据段进行压缩重建;Compress and reconstruct the data segment;

计算压缩前后的相对距离和余弦距离;Calculate the relative distance and cosine distance before and after compression;

将相对距离、余弦距离,以及LSTM网络隐藏层单元的输出合成为高斯混合模型估计器的输入量。The relative distance, cosine distance, and the output of the hidden layer unit of the LSTM network are synthesized as the input of the Gaussian mixture model estimator.

相对距离的数学表达式为:The mathematical expression for relative distance is:

其中:r为相对距离,L为对数据段中包含的时间序列的长度,xi为数据段中包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。Among them: r is the relative distance, L is the length of the time series contained in the data segment, x i is the element in the time series contained in the data segment, and x' is the element in the time series obtained after recombination.

余弦距离的数学表达式为:The mathematical expression for cosine distance is:

其中:c为余弦距离,||·||为范数,xi为数据段中包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。Where: c is the cosine distance, ||·|| is the norm, x i is the element in the time series contained in the data segment, and x′ is the element in the time series obtained after recombination.

高斯混合模型估计器采用多层感知器结构(Multilayer perceptions,MLP)。给定高斯混合模型所使用的高斯分布数目K,高斯混合模型估计器用于估计这K个高斯分布的三个参数,分别为混合概率Φ、均值μ、协方差Σ。The Gaussian mixture model estimator adopts a multi-layer perceptron structure (Multilayer perceptions, MLP). Given the number K of Gaussian distributions used by the Gaussian mixture model, the Gaussian mixture model estimator is used to estimate the three parameters of the K Gaussian distributions, which are the mixture probability Φ, the mean μ, and the covariance Σ.

参数估计过程如下:The parameter estimation process is as follows:

(1)首先使用多层神经网络将输入样本映射为K维向量,以确定用于估计每个高斯分布的所使用的数据。映射过程为:(1) First use a multilayer neural network to map input samples into K-dimensional vectors to determine the data used for estimating each Gaussian distribution. The mapping process is:

p=MLN(z;θ) p=MLN(z; θ)

其中z为输入到高斯混合模型估计器中的数据,MLN(·)为多层神经网络,其参数为θ,softmax(·)为softmax函数,为用于估计高斯混合模型参数的样本。where z is the data input to the Gaussian mixture model estimator, MLN( ) is a multi-layer neural network whose parameters are θ, and softmax( ) is the softmax function, is the sample used to estimate the parameters of the Gaussian mixture model.

(2)高斯混合模型的参数:混合概率Φ、均值μ、协方差Σ的估计公式如下:(2) The parameters of the Gaussian mixture model: the estimation formulas of the mixture probability Φ, the mean μ, and the covariance Σ are as follows:

其中分别为第k个高斯分布的混合概率、均值、协方差,为第i个训练样本的第k维数据,zi为第i个训练样本,N为训练样本的总数。in and are the mixture probability, mean, and covariance of the kth Gaussian distribution, respectively, is the k-th dimension data of the ith training sample, zi is the ith training sample, and N is the total number of training samples.

高斯混合模型估计器输出的异常得分的公式如下:The formula for the anomaly score output by the Gaussian mixture model estimator is as follows:

其中z为输入到估高斯混合模型估计器中的数据,K为给定的高斯分布数量,分别为第k个高斯分布的混合概率、均值、协方差。where z is the data input to the estimated Gaussian mixture model estimator, K is the given number of Gaussian distributions, and are the mixture probability, mean, and covariance of the kth Gaussian distribution, respectively.

数据压缩器与高斯混合模型估计器使用端到端的方式进行训练,训练的目标函数如下:The data compressor and the Gaussian mixture model estimator are trained in an end-to-end manner, and the training objective function is as follows:

其中:J为目标函数,λ1、λ2为人工设定的参数,xi为第i个数据段包含的时间序列,x′为由第i个数据段包含的时间序列充足后的时间序列,为惩罚项,其公式如下:Among them: J is the objective function, λ 1 and λ 2 are manually set parameters, x i is the time series included in the i-th data segment, and x′ is the time series after the time series included in the i-th data segment is sufficient , is the penalty term, and its formula is as follows:

其中:d为输入到高斯混合模型估计器中的样本z的维度,K为给定的高斯分布数目。where: d is the dimension of the sample z input to the Gaussian mixture model estimator, and K is the given number of Gaussian distributions.

确定用于异常检测的数据段的方法如下,计算切分后的每个数据段训练所生成的高斯分布的方差,选用可产生最小方差的数据段作为异常检测阶段送入异常检测模型的数据。The method of determining the data segment for anomaly detection is as follows: Calculate the variance of the Gaussian distribution generated by training for each segment of the segmented data, and select the data segment that can generate the smallest variance as the data sent to the anomaly detection model in the anomaly detection stage.

异常检测步骤的流程图如图4所示,其中数据预处理包括两个步骤:The flowchart of anomaly detection steps is shown in Figure 4, where data preprocessing includes two steps:

(1)数据切分:首先求序列的所有极值点,然后绝对值最大的极值点的位置作为切分点。(1) Data segmentation: First, find all the extreme points of the sequence, and then use the position of the extreme point with the largest absolute value as the segmentation point.

(2)数据填补:将切分好的多个序列使用0填补至异常检测模型的输入长度。(2) Data padding: padding multiple sequences that have been segmented to the input length of the anomaly detection model.

(3)挑选出模型训练阶段确定的用于异常检测的数据段。(3) Pick out the data segment determined in the model training stage for anomaly detection.

神经网络模型即为模型训练步骤中所训练的异常检测模型,τ为人为给定的异常分数分类阈值。The neural network model is the anomaly detection model trained in the model training step, and τ is the artificially given anomaly score classification threshold.

上述方法在Two-lead ECG数据集上进行了性能评估,并采用AUC、ROC作为衡量性能的指标,本发明所提出的方法AUC为0.8396573,图5列出了本发明所提出方法性能与其他方法性能在同一数据集上的对比数据,其中Seq2Cluster为所提出的方法。由此可见,本发明所提出的方法优于所有现有的同类无监督异常检测方法,可以说明本专利所述的异常检测方法具有先进性。The performance of the above method is evaluated on the Two-lead ECG data set, and AUC and ROC are used as indicators to measure performance. The AUC of the method proposed by the present invention is 0.8396573. Figure 5 lists the performance of the method proposed by the present invention and other methods Performance comparison data on the same dataset, where Seq2Cluster is the proposed method. It can be seen that the method proposed in the present invention is superior to all existing unsupervised anomaly detection methods of the same kind, which shows that the anomaly detection method described in this patent is advanced.

Claims (10)

1.一种基于无监督学习的时间序列异常检测方法,其特征在于,包括:1. a time series anomaly detection method based on unsupervised learning, is characterized in that, comprises: 将时间序列数据在其显著变化的位置进行切分,并对每一个切分后的数据段填补至设定长度;Divide the time series data at its significantly changed position, and fill each segmented data segment to a set length; 使用正常状态下的时间序列切分并填补后的多个数据段作为输入训练异常检测模型;The anomaly detection model is trained using multiple data segments that are segmented and filled in the normal state of the time series as input; 将由待检测时间序列切分并填补后的多个数据段作为异常检测模型的输入进行检测,并输出异常得分;The multiple data segments that are divided and filled by the time series to be detected are used as the input of the anomaly detection model to detect, and the anomaly score is output; 判断异常得分是否超过阈值,若为是,则判断发生异常,反之,则判断未发生异常。It is judged whether the abnormality score exceeds the threshold value, if yes, it is judged that an abnormality has occurred, otherwise, it is judged that no abnormality has occurred. 2.根据权利要求1所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述将时间序列数据在其显著变化的位置进行切分,具体包括:2. The method for detecting anomalies in time series based on unsupervised learning according to claim 1, wherein the method for segmenting the time series data at its significantly changed position specifically includes: 求时间序列数据的所有极值点;Find all extreme points of time series data; 然后将绝对值超贵设定阈值的的极值点的位置作为切分点切分为多个数据段。Then, the position of the extreme value point where the absolute value is too expensive to set the threshold value is used as the segmentation point to be divided into multiple data segments. 3.根据权利要求1所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述异常检测模型包括数据压缩器和高斯混合模型估计器,所述数据压缩器采用多对多的LSTM网络结构,所述高斯混合模型估计器采用多层感知器结构。3. The method for detecting anomalies in time series based on unsupervised learning according to claim 1, wherein the anomaly detection model comprises a data compressor and a Gaussian mixture model estimator, and the data compressor adopts multiple pairs of Multiple LSTM network structures, the Gaussian mixture model estimator adopts a multi-layer perceptron structure. 4.根据权利要求3所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述数据压缩器的压缩过程包括:4. The method for detecting anomalies in time series based on unsupervised learning according to claim 3, wherein the compression process of the data compressor comprises: 将数据段进行压缩重建;Compress and reconstruct the data segment; 计算压缩前后的相对距离和余弦距离;Calculate the relative distance and cosine distance before and after compression; 将相对距离、余弦距离,以及LSTM网络隐藏层单元的输出合成为高斯混合模型估计器的输入量。The relative distance, cosine distance, and the output of the hidden layer unit of the LSTM network are synthesized as the input of the Gaussian mixture model estimator. 5.根据权利要求4所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述相对距离的数学表达式为:5. a kind of time series anomaly detection method based on unsupervised learning according to claim 4, is characterized in that, the mathematical expression of described relative distance is: 其中:r为相对距离,L为对数据段中包含的时间序列的长度,xi为数据段中Among them: r is the relative distance, L is the length of the time series contained in the data segment, x i is the data segment 包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。The elements in the included time series, x' is the element in the time series obtained after recombination. 6.根据权利要求4所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述余弦距离的数学表达式为:6. a kind of time series anomaly detection method based on unsupervised learning according to claim 4, is characterized in that, the mathematical expression of described cosine distance is: 其中:c为余弦距离,||·||为范数,xi为数据段中包含的时间序列中的元素,x′为重组后得到的时间序列中的元素。Where: c is the cosine distance, ||·|| is the norm, x i is the element in the time series contained in the data segment, and x′ is the element in the time series obtained after recombination. 7.根据权利要求4所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述高斯混合模型估计器的训练过程包括:7. The method for detecting anomalies in time series based on unsupervised learning according to claim 4, wherein the training process of the Gaussian mixture model estimator comprises: 接收数据压缩器的输出并使用多层神经网络映射为K维向量,其中,K为模型中高斯分布的数目,The output of the data compressor is received and mapped into a K-dimensional vector using a multilayer neural network, where K is the number of Gaussian distributions in the model, 基于K维向量的各元素并使用多层感知器模型,得到各高斯分布的混合概率、均值和协方差;Based on each element of the K-dimensional vector and using the multilayer perceptron model, the mixture probability, mean and covariance of each Gaussian distribution are obtained; 所述高斯混合模型的检测过程包括:The detection process of the Gaussian mixture model includes: 接收数据压缩器的输出并计算得到异常得分。The output of the data compressor is received and an anomaly score is calculated. 8.根据权利要求7所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述异常得分的数学表达式为:8. a kind of time series abnormal detection method based on unsupervised learning according to claim 7, is characterized in that, the mathematical expression of described abnormal score is: 其中:Score(z)为异常得分,为第k个高斯分布的混合概率,为第k个高斯分布的协方差,z为数据压缩器的输出,为第k个高斯分布的均值,的逆矩阵。Among them: Score(z) is the abnormal score, is the mixture probability of the kth Gaussian distribution, is the covariance of the kth Gaussian distribution, z is the output of the data compressor, is the mean of the kth Gaussian distribution, for The inverse matrix of . 9.根据权利要求8所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,9. A kind of time series anomaly detection method based on unsupervised learning according to claim 8, is characterized in that, 所述第k个高斯分布的混合概率为:The mixture probability of the k-th Gaussian distribution is: 所述第k个高斯分布的均值为:The mean of the k-th Gaussian distribution is: 所述第k个高斯分布的协方差为:The covariance of the kth Gaussian distribution is: 其中:N为训练样本的总数,为第i个训练样本的第k维数据,zi为第i个训练样本。Where: N is the total number of training samples, is the k-th dimension data of the ith training sample, and zi is the ith training sample. 10.根据权利要求3所述的一种基于无监督学习的时间序列异常检测方法,其特征在于,所述数据压缩器与高斯混合模型估计器使用端到端的方式进行训练,训练的目标函数如下:10. The method for detecting anomalies in time series based on unsupervised learning according to claim 3, wherein the data compressor and the Gaussian mixture model estimator are trained in an end-to-end manner, and the training objective function is as follows : 其中:J为目标函数,λ1、λ2为人工设定的参数,xi为第i个数据段包含的时间序列,x′为由第i个数据段包含的时间序列充足后的时间序列,为惩罚项。Among them: J is the objective function, λ 1 and λ 2 are manually set parameters, x i is the time series included in the i-th data segment, and x′ is the time series after the time series included in the i-th data segment is sufficient , for punishment.
CN201910234623.8A 2019-03-26 2019-03-26 Unsupervised learning-based time series anomaly detection method Active CN110071913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910234623.8A CN110071913B (en) 2019-03-26 2019-03-26 Unsupervised learning-based time series anomaly detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910234623.8A CN110071913B (en) 2019-03-26 2019-03-26 Unsupervised learning-based time series anomaly detection method

Publications (2)

Publication Number Publication Date
CN110071913A true CN110071913A (en) 2019-07-30
CN110071913B CN110071913B (en) 2020-10-02

Family

ID=67366741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910234623.8A Active CN110071913B (en) 2019-03-26 2019-03-26 Unsupervised learning-based time series anomaly detection method

Country Status (1)

Country Link
CN (1) CN110071913B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430183A (en) * 2019-07-31 2019-11-08 福建师范大学 The MH-LSTM method for detecting abnormality of dialogue-based characteristic similarity
CN110460598A (en) * 2019-08-12 2019-11-15 西北工业大学深圳研究院 Anomaly Detection Method for Spatiotemporal Migration of Network Traffic
CN110717601A (en) * 2019-10-15 2020-01-21 厦门铅笔头信息科技有限公司 Anti-fraud method based on supervised learning and unsupervised learning
CN110895598A (en) * 2019-10-23 2020-03-20 山东九州信泰信息科技股份有限公司 Real-time anomaly detection parallelization method based on multi-source prediction
CN111177224A (en) * 2019-12-30 2020-05-19 浙江大学 An Unsupervised Anomaly Detection Method for Time Series Based on Conditional Normalized Flow Model
CN111562996A (en) * 2020-04-11 2020-08-21 北京交通大学 Method and system for detecting time sequence abnormality of key performance index data
CN111884874A (en) * 2020-07-15 2020-11-03 中国舰船研究设计中心 Programmable data plane-based ship network real-time anomaly detection method
CN112040338A (en) * 2020-07-31 2020-12-04 中国建设银行股份有限公司 Video playing cheating detection method and device and electronic equipment
CN112131272A (en) * 2020-09-22 2020-12-25 平安科技(深圳)有限公司 Detection method, device, equipment and storage medium for multi-element KPI time sequence
CN112257917A (en) * 2020-10-19 2021-01-22 北京工商大学 Time series abnormal mode detection method based on entropy characteristics and neural network
CN112416643A (en) * 2020-11-26 2021-02-26 清华大学 Unsupervised anomaly detection method and unsupervised anomaly detection device
CN112445842A (en) * 2020-11-20 2021-03-05 北京思特奇信息技术股份有限公司 Abnormal value detection method and system based on time series data
CN112511538A (en) * 2020-11-30 2021-03-16 杭州安恒信息技术股份有限公司 Network security detection method based on time sequence and related components
CN112751813A (en) * 2019-10-31 2021-05-04 国网浙江省电力有限公司 Network intrusion detection method and device
CN113076215A (en) * 2021-04-08 2021-07-06 华南理工大学 Unsupervised anomaly detection method independent of data types
CN113098640A (en) * 2021-03-26 2021-07-09 电子科技大学 Frequency spectrum anomaly detection method based on channel occupancy prediction
CN113469247A (en) * 2021-06-30 2021-10-01 广州天懋信息系统股份有限公司 Network asset abnormity detection method
CN113553232A (en) * 2021-07-12 2021-10-26 厦门大学 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait
CN117539739A (en) * 2023-12-11 2024-02-09 国网河南省电力公司经济技术研究院 User continuous behavior anomaly monitoring method based on double features

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047807A1 (en) * 2004-08-25 2006-03-02 Fujitsu Limited Method and system for detecting a network anomaly in a network
CN101826070A (en) * 2010-04-27 2010-09-08 上海第二工业大学 Key point-based data sequence linear fitting method
CN103150364A (en) * 2013-03-04 2013-06-12 福建师范大学 Time series feature extraction method
CN103561418A (en) * 2013-11-07 2014-02-05 东南大学 Anomaly detection method based on time series
CN104156473A (en) * 2014-08-25 2014-11-19 哈尔滨工业大学 LS-SVM-based method for detecting anomaly slot of sensor detection data
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
US20160285700A1 (en) * 2015-03-24 2016-09-29 Futurewei Technologies, Inc. Adaptive, Anomaly Detection Based Predictor for Network Time Series Data
CN106368816A (en) * 2016-10-27 2017-02-01 中国船舶工业系统工程研究院 Method for online abnormity detection of low-speed diesel engine of ship based on baseline deviation
CN108647737A (en) * 2018-05-17 2018-10-12 哈尔滨工业大学 A kind of auto-adaptive time sequence variation detection method and device based on cluster

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047807A1 (en) * 2004-08-25 2006-03-02 Fujitsu Limited Method and system for detecting a network anomaly in a network
CN101826070A (en) * 2010-04-27 2010-09-08 上海第二工业大学 Key point-based data sequence linear fitting method
CN103150364A (en) * 2013-03-04 2013-06-12 福建师范大学 Time series feature extraction method
CN103561418A (en) * 2013-11-07 2014-02-05 东南大学 Anomaly detection method based on time series
CN104156473A (en) * 2014-08-25 2014-11-19 哈尔滨工业大学 LS-SVM-based method for detecting anomaly slot of sensor detection data
US20160285700A1 (en) * 2015-03-24 2016-09-29 Futurewei Technologies, Inc. Adaptive, Anomaly Detection Based Predictor for Network Time Series Data
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
CN106368816A (en) * 2016-10-27 2017-02-01 中国船舶工业系统工程研究院 Method for online abnormity detection of low-speed diesel engine of ship based on baseline deviation
CN108647737A (en) * 2018-05-17 2018-10-12 哈尔滨工业大学 A kind of auto-adaptive time sequence variation detection method and device based on cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YONGJUN JIN, CHENLU QIU, LEI SUN, XUAN PENG, JIANNING ZHOU: "Anomaly detection in time series via robust PCA", 《2017 2ND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING (ICITE)》 *
庄池杰,张斌,胡军,李秋硕,曾嵘: "基于无监督学习的电力用户异常用电模式检测", 《中国电机工程学报》 *
李鸿利: "时间序列的行为匹配与评估技术研究", 《中国优秀硕士学位全文库 基础科学辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430183A (en) * 2019-07-31 2019-11-08 福建师范大学 The MH-LSTM method for detecting abnormality of dialogue-based characteristic similarity
CN110460598A (en) * 2019-08-12 2019-11-15 西北工业大学深圳研究院 Anomaly Detection Method for Spatiotemporal Migration of Network Traffic
CN110460598B (en) * 2019-08-12 2021-08-17 西北工业大学深圳研究院 Anomaly detection method for network traffic spatiotemporal migration
CN110717601A (en) * 2019-10-15 2020-01-21 厦门铅笔头信息科技有限公司 Anti-fraud method based on supervised learning and unsupervised learning
CN110717601B (en) * 2019-10-15 2022-05-03 厦门铅笔头信息科技有限公司 Anti-fraud method based on supervised learning and unsupervised learning
CN110895598B (en) * 2019-10-23 2021-09-14 山东九州信泰信息科技股份有限公司 Real-time anomaly detection parallelization method based on multi-source prediction
CN110895598A (en) * 2019-10-23 2020-03-20 山东九州信泰信息科技股份有限公司 Real-time anomaly detection parallelization method based on multi-source prediction
CN112751813A (en) * 2019-10-31 2021-05-04 国网浙江省电力有限公司 Network intrusion detection method and device
CN111177224A (en) * 2019-12-30 2020-05-19 浙江大学 An Unsupervised Anomaly Detection Method for Time Series Based on Conditional Normalized Flow Model
CN111562996A (en) * 2020-04-11 2020-08-21 北京交通大学 Method and system for detecting time sequence abnormality of key performance index data
CN111562996B (en) * 2020-04-11 2021-11-23 北京交通大学 Method and system for detecting time sequence abnormality of key performance index data
CN111884874A (en) * 2020-07-15 2020-11-03 中国舰船研究设计中心 Programmable data plane-based ship network real-time anomaly detection method
CN111884874B (en) * 2020-07-15 2022-02-01 中国舰船研究设计中心 Programmable data plane-based ship network real-time anomaly detection method
CN112040338A (en) * 2020-07-31 2020-12-04 中国建设银行股份有限公司 Video playing cheating detection method and device and electronic equipment
CN112131272B (en) * 2020-09-22 2023-11-10 平安科技(深圳)有限公司 Method, device, equipment and storage medium for detecting multi-element KPI time sequence
CN112131272A (en) * 2020-09-22 2020-12-25 平安科技(深圳)有限公司 Detection method, device, equipment and storage medium for multi-element KPI time sequence
CN112257917A (en) * 2020-10-19 2021-01-22 北京工商大学 Time series abnormal mode detection method based on entropy characteristics and neural network
CN112257917B (en) * 2020-10-19 2023-05-12 北京工商大学 Time sequence abnormal mode detection method based on entropy characteristics and neural network
CN112445842A (en) * 2020-11-20 2021-03-05 北京思特奇信息技术股份有限公司 Abnormal value detection method and system based on time series data
CN112416643A (en) * 2020-11-26 2021-02-26 清华大学 Unsupervised anomaly detection method and unsupervised anomaly detection device
CN112511538A (en) * 2020-11-30 2021-03-16 杭州安恒信息技术股份有限公司 Network security detection method based on time sequence and related components
CN113098640B (en) * 2021-03-26 2022-03-08 电子科技大学 Frequency spectrum anomaly detection method based on channel occupancy prediction
CN113098640A (en) * 2021-03-26 2021-07-09 电子科技大学 Frequency spectrum anomaly detection method based on channel occupancy prediction
CN113076215A (en) * 2021-04-08 2021-07-06 华南理工大学 Unsupervised anomaly detection method independent of data types
CN113469247B (en) * 2021-06-30 2022-04-01 广州天懋信息系统股份有限公司 Network asset abnormity detection method
CN113469247A (en) * 2021-06-30 2021-10-01 广州天懋信息系统股份有限公司 Network asset abnormity detection method
CN113553232A (en) * 2021-07-12 2021-10-26 厦门大学 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait
CN113553232B (en) * 2021-07-12 2023-12-05 厦门大学 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image
CN117539739A (en) * 2023-12-11 2024-02-09 国网河南省电力公司经济技术研究院 User continuous behavior anomaly monitoring method based on double features

Also Published As

Publication number Publication date
CN110071913B (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN110071913B (en) Unsupervised learning-based time series anomaly detection method
CN107941537B (en) A method for evaluating the health status of mechanical equipment
CN109948117B (en) Satellite anomaly detection method for network self-encoder
CN117421684B (en) Abnormal data monitoring and analyzing method based on data mining and neural network
CN112200244B (en) Intelligent detection method for anomaly of aerospace engine based on hierarchical countermeasure training
CN109522948A (en) A kind of fault detection method based on orthogonal locality preserving projections
Xu et al. Global attention mechanism based deep learning for remaining useful life prediction of aero-engine
CN116610998A (en) Switch cabinet fault diagnosis method and system based on multi-mode data fusion
CN104751229A (en) Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values
CN112861443B (en) Advanced learning fault diagnosis method integrated with priori knowledge
CN109117774A (en) A kind of multi-angle video method for detecting abnormality based on sparse coding
CN109460005A (en) Dynamic industrial process method for diagnosing faults based on GRU deep neural network
CN116304604A (en) Multivariate time series data anomaly detection, model training method and system
CN115081331A (en) An abnormal detection method of wind turbine operating state based on state parameter reconstruction error
CN115392381A (en) Anomaly Detection Method for Time Series Based on Unscented Kalman Filter
CN117645220A (en) Intelligent monitoring system and method for elevator running state
CN118332291A (en) A method for predicting aircraft multi-sensor data faults
CN116628621A (en) Method, device, equipment and storage medium for diagnosing abnormal event of multi-element time sequence data
CN115712833A (en) Unsupervised anomaly detection method for multi-dimensional unmanned aerial vehicle flight data based on time-space correlation
CN114464319B (en) A susceptibility assessment system for AMS based on slow feature analysis and deep neural network
CN116306806A (en) Fault diagnosis model determining method and device and nonvolatile storage medium
CN115326242A (en) On-line performance evaluation and fault diagnosis method and system of transmission line condition monitoring sensor
CN118883065A (en) A bearing early abnormality detection method based on Transformer model
CN118503879A (en) Method for detecting abnormality of operation and maintenance data of aircraft power supply system based on depth self-encoder
CN114185321A (en) A fault diagnosis method for electric actuators based on improved multi-class twin support vector machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant