KR102502648B1

KR102502648B1 - Method and devices for detecting anomaly in a time series using rnn

Info

Publication number: KR102502648B1
Application number: KR1020200093243A
Authority: KR
Inventors: 전인태
Original assignee: 가톨릭대학교 산학협력단
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2023-02-22
Also published as: KR20220013811A

Abstract

시계열에서 이상상황을 탐지하는 방법 및 장치가 개시된다. 일 실시예에 따른 탐지 시스템에 의해 수행되는 이상상황 탐지 방법은, 시계열 데이터를 이용하여 이상상황을 판단하기 위한 인코더와 디코더로 구성된 순환신경망을 훈련시키는 단계; 상기 인코더와 디코더로 구성된 순환신경망을 통해 계산된 스코어 점수에 기초하여 시계열 데이터에 대한 정상 구간, 주의 구간 및 특이 구간을 구별하는 단계; 실시간으로 발생하는 새로운 시계열 데이터를 상기 훈련된 인코더와 디코더로 구성된 순환신경망에 입력하는 단계; 및 상기 훈련된 인코더와 디코더로 구성된 순환신경망을 이용하여 상기 실시간으로 발생하는 새로운 시계열 데이터로부터 이상상황 발생 여부를 판단하는 단계를 포함할 수 있다. A method and apparatus for detecting an anomaly in a time series are disclosed. An abnormal situation detection method performed by a detection system according to an embodiment includes training a recurrent neural network composed of an encoder and a decoder for determining an abnormal situation using time-series data; discriminating a normal section, an attention section, and a singular section for time series data based on a score calculated through a recurrent neural network composed of the encoder and decoder; inputting new time-series data generated in real time to the recurrent neural network composed of the trained encoder and decoder; and determining whether an abnormal situation has occurred from the new time-series data generated in real time using a recurrent neural network composed of the trained encoder and decoder.

Description

Method and device for detecting anomalies in time series using RNN encoder-decoder {METHOD AND DEVICES FOR DETECTING ANOMALY IN A TIME SERIES USING RNN}

아래의 설명은 시계열 데이터의 이상상황(anomaly)을 탐지하는 기술에 관한 것으로, 인공신경망의 RNN을 활용한 인코더-디코더 방법을 이용하여 데이터의 유형에 따라 지도 또는 비지도 학습을 진행하여 데이터의 유형을 분류하고 이를 통해 시계열에서 특이한 상황이 발생하는지를 실시간으로 탐지하는 방법 및 장치에 관한 것이다.The following description relates to a technique for detecting anomalies in time series data, using an encoder-decoder method using an RNN of an artificial neural network to perform supervised or unsupervised learning according to the type of data to determine the type of data. It relates to a method and apparatus for classifying and detecting in real time whether a peculiar situation occurs in a time series through this.

최근 급속도로 발전하고 있는 인공신경망(Artificial Neural Network) 모델 중 하나인 순환신경망(Recurrent Neural Network, RNN)은 자연어를 다루거나 시계열을 분석하고 예측하는 데에 많이 활용되고 있다. 일반적인 RNN이 시간이 많이 지난 과거의 데이터의 정보를 잘 활용하지 못하는 단점을 보완하여 LSTM(Long Short Term Memory)이 개발되었고 두 개의 LSTM을 결합시킨 형태의 인코더-디코더 모델은 한 문장을 입력시키고 이에 대응되는 새로운 문장을 출력하는 구조로 되어 있어, 특히 문장을 번역하는데 높은 성능을 보이고 있다.Recurrent Neural Network (RNN), one of the rapidly developing artificial neural network models, is widely used to deal with natural language or to analyze and predict time series. LSTM (Long Short Term Memory) was developed to compensate for the disadvantage that general RNNs do not utilize information of past data well, and an encoder-decoder model in the form of combining two LSTMs inputs one sentence and It has a structure that outputs a corresponding new sentence, and especially shows high performance in translating sentences.

또한, 산업이 발전함에 따라 수없이 많은 정보와 데이터가 생성되고 전송되고 있으며 이러한 데이터가 정상인지 아니면 이상상황을 포함하고 있는 특이 데이터인지 구별할 수 있는 기술은 점점 중요해지고 있다. 특히, 실시간으로 들어오는 시계열 데이터에서의 실시간 이상상황 탐지 기술은 아직 잘 정립되어 있지 않고 정확도도 높지 않다. In addition, as the industry develops, countless amounts of information and data are generated and transmitted, and a technology capable of distinguishing whether such data is normal or unusual data including an abnormal situation is becoming increasingly important. In particular, real-time anomaly detection technology in real-time incoming time-series data is not yet well established and its accuracy is not high.

이러한 시계열 데이터의 이상상황 탐지와 관련된 연구는 기존의 통계적 모델에서 나아가 딥러닝의 LSTM이나 인코더-디코더 모델을 활용하는 방법들이 개발되고 있다. 이러한 연구는 RNN 계열의 방법을 이용하여 미래를 예측하고 이 예측한 내용으로부터 추출된 지표가 특정한 임계치를 초과하는 경우 이상상황이 발생했거나 이상상황이 발생할 수 있다고 경보를 내리는 방법이 주를 이루고 있으며 기존의 통계적인 방법보다 많은 정확도를 보이고 있다. Research related to the detection of such anomalies in time series data goes beyond the existing statistical models and methods using LSTM or encoder-decoder models of deep learning are being developed. These studies are mainly focused on predicting the future using RNN-based methods and when the indicators extracted from the predictions exceed a certain threshold, an abnormal situation has occurred or an alarm that an abnormal situation may occur. It is more accurate than the statistical method of

그러나 상기 방법을 적용한 일반적인 시계열에 대한 미래 예측은 기존의 통계적인 방법보다 정확도가 높아졌다고는 해도, 아직 정확하지 않은 경우가 많으며, 특히 이상상황에 대한 정보가 주어지지 않은 데이터만을 이용하여 비지도 학습을 통해 실시간으로 이상상황을 탐지하는 인공지능 알고리즘은 잘 연구되어 있지 않다. However, although the future prediction of general time series using the above method is more accurate than the existing statistical method, it is still inaccurate in many cases. Artificial intelligence algorithms that detect abnormal situations in real time through

또한, 기존의 연구는 임계치를 어떻게 정하는 지에 대해서 구체적이지 않거나 합리적이지 않은 경우가 대부분이다. 특히, 많은 시계열 데이터의 경우 이상상황을 포함하고 있는지 아닌지 알 수 없는 경우, 즉 레이블(label)이 없는 경우가 대부분인데 이때 임계치를 어떻게 알 수 있는지에 대한 해결책을 제시하지 못하고 있으며 임계치를 인위적으로 정하는 방법을 적용하여 특이상황을 탐지하기는 어렵다.In addition, most of the existing studies are not specific or reasonable about how to set the threshold. In particular, in the case of many time series data, it is unknown whether or not an anomaly is included, that is, in most cases there is no label. It is difficult to detect unusual situations by applying the method.

실시간으로 주어지는 시계열 데이터에서 실시간으로 이상상황을 예측하는 방법은 아직 많이 활용되고 있지 않는 실정이다. A method of predicting an anomaly in real time from time series data given in real time is not yet widely used.

현대 사회에서는 매 순간 수없이 많은 데이터들이 생성되고 있다. 각종 기계의 작동 소리, 사람의 뇌파, 주식가격 및 환율, cctv 정보 등 수없이 많은 정보들이 실시간으로 시계열을 이루며 데이터로 수집되고 있다. In modern society, countless amounts of data are being created every moment. Countless amounts of information, such as operation sounds of various machines, human brainwaves, stock prices and exchange rates, and cctv information, are being collected as data in real time in time series.

그런데 이러한 시계열 속에는 기계의 오작동, 건강이상, 주가조작, 외부인 침범 등 정상적인 상황과는 거리가 먼 특이 현상이 종종 발생하게 된다. 이러한 특이상황은 사람들이 실시간으로 감시할 수 없는 경우가 많으므로 자동으로 감지하여 사람들에게 알려줌으로써 빨리 대처할 수 있는 기술이 필요하다. However, in this time series, unusual phenomena that are far from normal situations such as machine malfunctions, health problems, stock price manipulation, and outsider invasion often occur. Since there are many cases where people cannot monitor such unusual situations in real time, a technology is needed to respond quickly by automatically detecting and notifying people.

본 발명은 딥러닝을 이용하여 실시간으로 주어지는 시계열 데이터에 이상상황이 발생하였는지를 빠르게 포착하여 알려주는 동작을 포함할 수 있다. 특히, 기존의 데이터가 이상상황을 포함하고 있는지 아닌지 알고 있는 경우뿐만 아니라, 이상상황을 포함하고 있는지 아닌지 알지 못하는 경우에도 특이상황을 탐지해내는 방법 및 장치를 제공할 수 있다. The present invention may include an operation of quickly capturing and notifying whether an anomaly occurs in real-time given time series data using deep learning. In particular, it is possible to provide a method and apparatus for detecting an unusual situation not only when it is known whether or not existing data includes an abnormal situation, but also when it is not known whether or not an abnormal situation is included.

탐지 시스템에 의해 수행되는 이상상황 탐지 방법은, 시계열 데이터를 이용하여 이상상황을 판단하기 위한 인코더와 디코더로 구성된 순환신경망을 훈련시키는 단계; 상기 인코더와 디코더로 구성된 순환신경망을 통해 계산된 스코어 점수에 기초하여 시계열 데이터에 대한 정상 구간, 주의 구간 및 특이 구간을 구별하는 단계; 실시간으로 발생하는 새로운 시계열 데이터를 상기 훈련된 인코더와 디코더로 구성된 순환신경망에 입력하는 단계; 및 상기 훈련된 인코더와 디코더로 구성된 순환신경망을 이용하여 상기 실시간으로 발생하는 새로운 시계열 데이터로부터 이상상황 발생 여부를 판단하는 단계를 포함할 수 있다. An abnormal situation detection method performed by a detection system includes training a recurrent neural network composed of an encoder and a decoder for determining an abnormal situation using time-series data; discriminating a normal section, an attention section, and a singular section for time series data based on a score calculated through a recurrent neural network composed of the encoder and decoder; inputting new time-series data generated in real time to the recurrent neural network composed of the trained encoder and decoder; and determining whether an abnormal situation has occurred from the new time-series data generated in real time using a recurrent neural network composed of the trained encoder and decoder.

상기 훈련시키는 단계는, 상기 시계열 데이터에 레이블(label)이 존재하는지 여부를 판단하고, 상기 시계열 데이터에 레이블이 존재하지 않을 경우, 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 과정(identification process)을 수행하고, 상기 시계열 데이터에서 각 데이터에 대한 특이점수를 계산하고, 상기 계산된 특이점수를 이용하여 정상 데이터와 특이 데이터를 한번 또는 순차적으로 구별하는 단계를 포함할 수 있다. The training step may include a process of determining whether a label exists in the time series data, and if a label does not exist in the time series data, distinguishing normal data from unusual data in the time series data (identification process) , calculating a singular score for each data in the time series data, and distinguishing normal data from unusual data once or sequentially using the calculated singular score.

상기 훈련시키는 단계는, 상기 시계열 데이터 중에서 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)을 구성하고, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)과 손실(loss) 함수를 이용하여 상기 인코더-디코더로 구성된 순환신경망을 훈련시키고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수와 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수의 최대값인 N-스코어를 계산하고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)에 대한 특이점수와, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋에 대한 특이점수의 최소값인 A-스코어를 계산하는 단계를 포함하고, 상기 구별하는 단계는, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 작은 경우, 상기 계산된 N-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 N-스코어 이상 상기 계산된 A-스코어 미만인 경우, 주의 구역, 상기 계산된 A-스코어 이상인 경우, 특이 구역으로 구분하고, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 큰 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 A-스코어 이상 상기 계산된 N-스코어 미만인 경우, 주의 구역, 상기 계산된 N-스코어 이상인 경우, 특이 구역으로 구분하고, 상기 특이점수의 분포(distribution) 중 꼬리(tail) 부분에서 기 설정된 기준 이상의 차이를 보이는 경계점을 기준으로 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 경계 데이터를 탐색하고, 상기 탐색된 경계 데이터를 이용하여 시계열 데이터의 정상 상태, 주의 상태 및 특이 상태를 포함하는 이상상황을 탐지하는 단계를 포함할 수 있다. In the training step, a data set (HD) having a singular score equal to or higher than a preset criterion is constructed from among the time series data, and a data set (LD) obtained by removing data having a singular score equal to or higher than the preset criterion and a loss function The recurrent neural network composed of the encoder-decoder is trained using , and the singular score for the data set (LD) from which data having a singular score equal to or higher than the predetermined criterion is removed through the training and a singular score having a singular score equal to or greater than the predetermined criterion An N-score, which is the maximum value of the singular score for the data set (LD) from which data is removed, is calculated, and the singular score for the data set (HD) having a singular score equal to or higher than the preset standard through the training, and the preset Calculating an A-score, which is the minimum value of the singularity score, for a data set having a singularity score equal to or greater than a reference point, wherein the distinguishing step is performed when the calculated N-score is smaller than the calculated A-score. The calculated N-score or less is determined as a normal zone, and if the calculated N-score or more and the calculated A-score is less than the calculated A-score, it is classified as a caution zone, and if the calculated A-score or more is classified as a special zone, If the N-score is greater than the calculated A-score, less than the calculated A-score is determined as a normal zone, and if the calculated A-score or more is less than the calculated N-score, the caution zone, the calculated If it is N-score or higher, it is classified as a singular region, and in the distribution of singular scores, based on a boundary point showing a difference of more than a predetermined standard in the tail part, distinguishing normal data and singular data from the time series data Searching for boundary data, and detecting an abnormal situation including a normal state, a state of caution, and an unusual state of the time series data using the searched boundary data.

상기 훈련시키는 단계는, 상기 시계열 데이터에 레이블이 존재할 경우, 상기 시계열 데이터로부터 정상 데이터 및 특이 데이터를 분류하고, 상기 훈련을 통해 상기 분류된 정상 데이터에 대한 N-스코어를 계산하고, 상기 분류된 특이 데이터를 이용하여 A-스코어를 계산하고, 상기 구별하는 단계는, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 작은 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 N-스코어 이상 상기 계산된 A-스코어 미만인 경우, 주의 구역, 상기 계산된 A-스코어 이상인 경우, 특이 구역으로 구분하는 단계, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 큰 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 A-스코어 이상 상기 계산된 N-스코어 미만인 경우, 주의 구역, 상기 계산된 N-스코어 이상인 경우, 특이 구역으로 구분하는 단계를 포함할 수 있다. In the training step, if a label exists in the time series data, classify normal data and outlier data from the time series data, calculate an N-score for the classified normal data through the training, and calculate the classified outlier data. In the step of calculating an A-score using data and discriminating, if the calculated N-score is smaller than the calculated A-score, it is determined that less than the calculated A-score is a normal zone, and the calculation If the calculated N-score or more is less than the calculated A-score, dividing into a caution zone, if the calculated A-score or more, into a special zone, if the calculated N-score is greater than the calculated A-score, Determining that less than the calculated A-score is a normal area, and classifying into a caution area if the calculated A-score or more and less than the calculated N-score, and a special area if the calculated N-score or more can do.

상기 판단하는 단계는, 상기 훈련된 인코더-디코더로 구성된 순환신경망을 이용하여 상기 새로운 시계열 데이터에 대한 특이점수를 계산하고, 상기 계산된 특이점수를 이용하여 정상 데이터, 주의 데이터 또는 특이 데이터를 포함하는 이상상황 발생 여부를 판단하는 단계를 포함할 수 있다. The determining step may include calculating a singular score for the new time series data using a recurrent neural network composed of the trained encoder-decoder, and including normal data, attention data, or singular data using the calculated singular score. A step of determining whether an abnormal situation has occurred may be included.

탐지 시스템에 의해 수행되는 이상상황 탐지 방법을 실행시키기 위해 컴퓨터 판독 가능한 저장 매체에 저장된 프로그램은, 시계열 데이터를 이용하여 이상상황을 판단하기 위한 인코더와 디코더로 구성된 순환신경망을 훈련시키는 단계; 상기 인코더와 디코더로 구성된 순환신경망을 통해 계산된 스코어 점수에 기초하여 시계열 데이터에 대한 정상 구간, 주의 구간 및 특이 구간을 구별하는 단계; 실시간으로 발생하는 새로운 시계열 데이터를 상기 훈련된 인코더와 디코더로 구성된 순환신경망에 입력하는 단계; 및 상기 훈련된 인코더와 디코더로 구성된 순환신경망을 이용하여 상기 실시간으로 발생하는 새로운 시계열 데이터로부터 이상상황 발생 여부를 판단하는 단계를 포함할 수 있다. A program stored in a computer readable storage medium to execute the abnormal situation detection method performed by the detection system includes training a recurrent neural network composed of an encoder and a decoder for determining an abnormal situation using time-series data; discriminating a normal section, an attention section, and a singular section for time series data based on a score calculated through a recurrent neural network composed of the encoder and decoder; inputting new time-series data generated in real time to the recurrent neural network composed of the trained encoder and decoder; and determining whether an abnormal situation has occurred from the new time-series data generated in real time using a recurrent neural network composed of the trained encoder and decoder.

상기 훈련시키는 단계는, 상기 시계열 데이터에 레이블(label)이 존재하는지 여부를 판단하고, 상기 시계열 데이터에 레이블이 존재하지 않을 경우, 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 과정(identification process)을 수행하고, 상기 시계열 데이터 중에서 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)을 구성하고, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)과 손실(loss) 함수를 이용하여 상기 인코더-디코더로 구성된 순환신경망을 훈련시키고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수와 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수의 최대값인 N-스코어를 계산하고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)에 대한 특이점수와, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋에 대한 특이점수의 최소값인 A-스코어를 계산하고, 상기 시계열 데이터에 레이블이 존재할 경우, 상기 훈련을 통해 상기 시계열 데이터로부터 정상 데이터 및 특이 데이터를 분류하고, 상기 분류된 정상 데이터에 대한 N-스코어를 계산할 수 있고, 상기 분류된 특이 데이터를 이용하여 A-스코어를 계산하고, 상기 구별하는 단계는, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 작은 경우, 상기 계산된 N-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 N-스코어 이상 상기 계산된 A-스코어 미만인 경우, 주의 구역, 상기 계산된 A-스코어 이상인 경우, 특이 구역으로 구분하고, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 큰 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 A-스코어 이상 상기 계산된 N-스코어 미만인 경우, 주의 구역, 상기 계산된 N-스코어 이상인 경우, 특이 구역으로 구분하는 단계를 포함할 수 있다. The training step may include a process of determining whether a label exists in the time series data, and if a label does not exist in the time series data, distinguishing normal data from unusual data in the time series data (identification process) and constructing a data set (HD) having a singular score equal to or greater than a predetermined criterion from among the time series data, and using a data set (LD) from which data having a singular score equal to or greater than the predetermined criterion is removed and a loss function to train the recurrent neural network composed of the encoder-decoder, and through the training, singular scores for the data set (LD) from which data having singular scores equal to or higher than the predetermined criterion are removed and data having singular scores equal to or greater than the predetermined criterion N-score, which is the maximum value of the singular score for the removed data set (LD), is calculated, and through the training, the singular score for the data set (HD) having a singular score equal to or higher than the preset standard and the singular score equal to or higher than the preset standard are calculated. Calculate an A-score, which is the minimum value of the singular score, for a data set having a singular score, and if a label exists in the time series data, classify normal data and singular data from the time series data through the training, and classify the classified normal data. In the step of calculating an N-score for the data, calculating an A-score using the classified specific data, and discriminating, if the calculated N-score is smaller than the calculated A-score, the The calculated N-score or less is determined as a normal zone, and if the calculated N-score or more and the calculated A-score is less than the calculated A-score, it is classified as a caution zone, and if the calculated A-score or more is classified as a special zone, If the N-score is greater than the calculated A-score, less than the calculated A-score is determined as a normal zone, and if the calculated A-score or more is less than the calculated N-score, the caution zone, the calculated If it is N-score or higher, it will include a step of classifying it into a specific area. can

상기 구별하는 단계는, 상기 특이점수의 분포(distribution) 중 꼬리(tail) 부분에서 기 설정된 기준 이상의 차이를 보이는 경계점을 기준으로 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 경계 데이터를 탐색하고, 상기 탐색된 경계 데이터를 이용하여 시계열 데이터의 정상 상태, 주의 상태 및 특이 상태를 포함하는 이상상황을 탐지하는 단계를 포함할 수 있다. The discriminating step may include searching for boundary data for distinguishing normal data from singular data in the time series data based on a boundary point showing a difference greater than or equal to a preset standard in a tail part of the distribution of singular scores; and detecting an abnormal situation including a normal state, a caution state, and a peculiar state of the time series data using the searched boundary data.

이상상황을 탐지하기 위한 탐지 시스템은, 시계열 데이터를 이용하여 이상상황을 판단하기 위한 인코더와 디코더로 구성된 순환신경망을 훈련시키는 훈련부; 상기 인코더와 디코더로 구성된 순환신경망을 통해 계산된 스코어 점수에 기초하여 시계열 데이터에 대한 정상 구간, 주의 구간 및 특이 구간을 구별하는 구별부; 실시간으로 발생하는 새로운 시계열 데이터를 상기 훈련된 인코더와 디코더로 구성된 순환신경망에 입력하는 입력부; 및 상기 훈련된 인코더와 디코더로 구성된 순환신경망을 이용하여 상기 실시간으로 발생하는 새로운 시계열 데이터로부터 이상상황 발생 여부를 판단하는 판단부를 포함할 수 있다. A detection system for detecting an anomaly includes a training unit for training a recurrent neural network composed of an encoder and a decoder for determining an anomaly using time-series data; a distinguishing unit for discriminating a normal section, a caution section, and a singular section for the time-series data based on the score calculated through the recurrent neural network composed of the encoder and decoder; an input unit inputting new time-series data generated in real time to the recurrent neural network composed of the trained encoder and decoder; and a determination unit for determining whether an abnormal situation has occurred from the new time-series data generated in real time using a recurrent neural network composed of the trained encoder and decoder.

상기 훈련부는, 상기 시계열 데이터에 레이블(label)이 존재하는지 여부를 판단하고, 상기 시계열 데이터에 레이블이 존재하지 않을 경우, 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 과정(identification process)을 수행하고, 상기 시계열 데이터에서 각 데이터에 대한 특이점수를 계산하고, 상기 계산된 특이점수를 이용하여 정상 데이터와 특이 데이터를 한번 또는 순차적으로 구별하고, 상기 시계열 데이터 중에서 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)을 구성하고, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)과 손실(loss) 함수를 이용하여 상기 인코더-디코더로 구성된 순환신경망을 훈련시키고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수와 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수의 최대값인 N-스코어를 계산하고, 상기 훈련을 통해 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)에 대한 특이점수와, 상기 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋에 대한 특이점수의 최소값인 A-스코어를 계산하고, 상기 시계열 데이터에 레이블이 존재할 경우, 상기 훈련을 통해 상기 시계열 데이터로부터 정상 데이터 및 특이 데이터를 분류하고, 상기 분류된 정상 데이터에 대한 N-스코어를 계산할 수 있고, 상기 분류된 특이 데이터를 이용하여 A-스코어를 계산하고, 상기 구별하는 단계는, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 작은 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 N-스코어 이상 상기 계산된 A-스코어 미만인 경우, 주의 구역, 상기 계산된 A-스코어 이상인 경우, 특이 구역으로 구분하고, 상기 계산된 N-스코어가 상기 계산된 A-스코어보다 큰 경우, 상기 계산된 A-스코어 미만을 정상 구역으로 판단하고, 상기 계산된 A-스코어 이상 상기 계산된 N-스코어 미만인 경우, 주의 구역, 상기 계산된 N-스코어 이상인 경우, 특이 구역으로 구분할 수 있다. The training unit determines whether a label exists in the time series data, and if the label does not exist in the time series data, performs an identification process to distinguish normal data from unusual data in the time series data. and calculates a singular score for each data in the time series data, distinguishes normal data from singular data once or sequentially using the calculated singular score, and data having a singular score equal to or higher than a predetermined standard among the time series data. A set (HD) is formed, and the recurrent neural network composed of the encoder-decoder is trained using a data set (LD) and a loss function from which data having singular scores equal to or higher than the predetermined standard are removed, and through the training N-score, which is the maximum value of the singular score for the data set (LD) from which data having a singular score equal to or greater than the predetermined criterion is removed and the singular score for the data set (LD) from which data having a singular score greater than or equal to the predetermined criterion is removed Calculate , and through the training, a singular score for a data set (HD) having a singular score equal to or higher than the predetermined criterion and an A-score, which is the minimum value of the singular score for a data set having a singular score equal to or higher than the predetermined criterion, are calculated. and if a label exists in the time series data, it is possible to classify normal data and unusual data from the time series data through the training, calculate an N-score for the classified normal data, and calculate the classified unusual data In the step of calculating an A-score and discriminating, if the calculated N-score is smaller than the calculated A-score, it is determined that less than the calculated A-score is a normal zone, and the calculated N-score is determined as a normal area. -If the score is greater than or equal to the calculated A-score, it is classified as a caution zone, if it is greater than or equal to the calculated A-score, as a special zone, and if the calculated N-score is greater than the calculated A-score, the calculated N-score is greater than the calculated A-score. Below the A-score is judged as a normal area, and If the calculated A-score or more is less than the calculated N-score, it can be classified as a caution zone, and if it is more than the calculated N-score, it can be classified as a special zone.

상기 구별부는, 상기 특이점수의 분포(distribution) 중 꼬리(tail) 부분에서 기 설정된 기준 이상의 차이를 보이는 경계점을 기준으로 상기 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 경계 데이터를 탐색하고, 상기 탐색된 경계 데이터를 이용하여 시계열 데이터의 정상 상태, 주의 상태 및 특이 상태를 포함하는 이상상황을 탐지할 수 있다. The discrimination unit searches for boundary data for distinguishing normal data from singular data in the time series data based on a boundary point showing a difference greater than or equal to a preset standard in a tail portion of the distribution of singular scores, and the search Using the boundary data, it is possible to detect abnormal situations including the normal state, attention state, and unusual state of time series data.

특이현상을 알아내는 기존의 방법들이 특이상태를 규정하는 임계치 선정이 모호하고 실시간으로 적용하는 방법론을 잘 제시하고 있지 않다는 문제점을 해결하기 위하여, 인코더-디코더의 방법을 활용하여 각 시계열 데이터에 특이점수를 부여하고, 이를 실시간으로 적용하는 방법을 고안해냄으로써 실시간 감시가 필요한 기계장비, 금융, IoT 등 많은 분야에서 보다 폭 넓은 활용이 가능해질 것으로 예상된다. In order to solve the problem that the existing methods for detecting outliers are ambiguous in the selection of the critical value defining the singularity and do not provide a methodology for applying them in real time, the encoder-decoder method is used to assign singular scores to each time series data. and by devising a method to apply it in real time, it is expected that a wider range of applications will be possible in many fields such as mechanical equipment, finance, and IoT that require real-time monitoring.

도 1은 일 실시예에 따른 탐지 시스템에서 시계열 데이터의 구성을 설명하기 위한 블록도이다.
도 2는 일 실시예에 따른 탐지 시스템에서 시계열 데이터의 이상상황을 탐지하는 방법을 설명하기 위한 흐름도이다.
도 3은 탐지 시스템에서 시계열 데이터의 이상상황을 탐지하는 동작을 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 탐지 시스템에서 인코더-디코더로 구성된 순환신경망을 설명하기 위한 도면이다. 1 is a block diagram illustrating a configuration of time series data in a detection system according to an exemplary embodiment.
2 is a flowchart illustrating a method of detecting an abnormal situation of time series data in a detection system according to an embodiment.
3 is a diagram for explaining an operation of detecting an abnormal situation of time series data in a detection system.
4 is a diagram for explaining a recurrent neural network composed of an encoder-decoder in a detection system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, an embodiment will be described in detail with reference to the accompanying drawings.

실시예에서는 딥러닝의 순환신경망(RNN)을 활용한 인코더-디코더를 이용하여 시계열 데이터(time series data)에서 이상상황(특이상황)(anomaly)의 발생을 탐지하는 기술에 대하여 설명하기로 한다. 이상상황의 발생을 탐지함에 있어서, 레이블(label)이 있는 지도학습의 경우뿐만 아니라, 레이블이 없는 비지도 학습의 경우에서도 데이터의 특성이 반영된 임계치를 정하고, 이를 통해 시계열 데이터에서 이상상황이 발행하는지 여부를 탐지할 수 있다. 이때, 이미 발생한 데이터뿐만 아니라 실시간으로 발생하는 시계열 데이터에서도 이루어질 수 있다. In an embodiment, a technique for detecting the occurrence of an anomaly in time series data using an encoder-decoder using a recurrent neural network (RNN) of deep learning will be described. In detecting the occurrence of an anomaly, a threshold that reflects the characteristics of the data is set not only in the case of labeled supervised learning, but also in the case of unsupervised learning without a label, and through this, whether an anomaly occurs in time series data can detect whether In this case, it may be performed not only on already generated data but also on time-series data generated in real time.

도 1은 일 실시예에 따른 탐지 시스템에서 시계열 데이터의 구성을 설명하기 위한 블록도이고, 도 2는 일 실시예에 따른 탐지 시스템에서 시계열 데이터의 이상상황을 탐지하는 방법을 설명하기 위한 흐름도이다.1 is a block diagram illustrating a configuration of time-series data in a detection system according to an embodiment, and FIG. 2 is a flowchart illustrating a method of detecting an anomaly of time-series data in a detection system according to an embodiment.

탐지 시스템(100)의 프로세서는 훈련부(110), 구별부(120), 입력부(130) 및 판단부(140)를 포함할 수 있다. 이러한 프로세서의 구성요소들은 탐지 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 2의 시계열 데이터의 이상상황 탐지 방법이 포함하는 단계들(210 내지 240)을 수행하도록 탐지 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor of the detection system 100 may include a training unit 110 , a discrimination unit 120 , an input unit 130 and a determination unit 140 . These components of the processor may represent different functions performed by the processor according to control instructions provided by program code stored in the detection system. The processor and components of the processor may control the detection system to perform steps 210 to 240 included in the method for detecting anomaly in time series data of FIG. 2 . In this case, the processor and components of the processor may be implemented to execute instructions according to the code of an operating system included in the memory and the code of at least one program.

프로세서는 시계열 데이터의 이상상황 탐지 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 탐지 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 탐지 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서가 포함하는 훈련부(110), 구별부(120), 입력부(130) 및 판단부(140) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(210 내지 240)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다. The processor may load a program code stored in a file of a program for a method for detecting an abnormal situation of time series data into a memory. For example, when a program is executed in the detection system, the processor may control the detection system to load a program code from a file of the program into a memory under the control of an operating system. At this time, each of the processor and the training unit 110, the discrimination unit 120, the input unit 130, and the determination unit 140 included in the processor executes a command of a corresponding part of the program code loaded into the memory to perform subsequent steps ( 210 to 240) may be different functional representations of the processor.

단계(210)에서 훈련부(110)는 시계열 데이터를 이용하여 정상 데이터 또는 특이 데이터를 구별하기 위한 인코더와 디코더로 구성된 순환신경망을 훈련시킬 수 있다. 훈련부(110)는 시계열 데이터에 레이블(label)이 존재하는지 여부를 판단하고, 시계열 데이터에 레이블이 존재하지 않을 경우, 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 과정(identification process)을 수행하고, 시계열 데이터에서 각 데이터에 대한 특이점수(anomaly score)를 계산하고, 계산된 특이점수를 이용하여 정상 데이터와 특이 데이터를 한번 또는 순차적으로 구별할 수 있다. 이때, 각 데이터에 대한 특이점수는 인코더-디코더를 이용한 순환신경망 훈련을 마친 경우의 비용함수(cost function)을 의미할 수 있다. 훈련부(110)는 시계열 데이터 중에서 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)을 구성하고, 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)과 손실(loss) 함수를 이용하여 인코더-디코더로 구성된 순환신경망을 훈련시키고, 훈련을 통해 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수와 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터 셋(LD)에 대한 특이점수의 최대값인 N-스코어를 계산하고, 훈련을 통해 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋(HD)에 대한 특이점수와, 기 설정된 기준 이상의 특이점수를 갖는 데이터 셋에 대한 특이점수의 최소값인 A-스코어를 계산할 수 있다. 훈련부(110)는 시계열 데이터에 레이블이 존재할 경우, 훈련을 통해 시계열 데이터로부터 정상 데이터 및 특이 데이터를 분류하고, 분류된 정상 데이터에 대한 N-스코어를 계산할 수 있고, 분류된 특이 데이터를 이용하여 A-스코어를 계산할 수 있다. In step 210, the training unit 110 may train a recurrent neural network composed of an encoder and a decoder for discriminating normal data or unusual data using time series data. The training unit 110 determines whether a label exists in the time series data, and if the label does not exist in the time series data, performs an identification process to distinguish normal data from unusual data in the time series data, An anomaly score for each data in the time series data is calculated, and normal data and anomaly data can be distinguished once or sequentially using the calculated singular score. In this case, the singular score for each data may mean a cost function when training of the recurrent neural network using the encoder-decoder is completed. The training unit 110 constructs a data set (HD) having a singular score equal to or higher than a predetermined criterion among time series data, and uses a data set (LD) from which data having a singular score equal to or greater than a predetermined criterion is removed and a loss function. A recurrent neural network composed of encoder-decoders is trained, and the singular scores for the data set (LD) from which data having singular scores equal to or higher than a predetermined criterion are removed through training and the data set (LD) from which data having singular scores greater than or equal to a predetermined criterion are removed ), calculates the N-score, which is the maximum value of the singular score for ), and calculates the singular score for the data set (HD) having a singular score greater than or equal to a predetermined standard through training and the data set having a singular score greater than or equal to a predetermined standard An A-score, which is the minimum singular score, can be calculated. When a label exists in the time series data, the training unit 110 may classify normal data and singular data from the time series data through training, calculate an N-score for the classified normal data, and use the classified singular data to A - Scores can be calculated.

단계(220)에서 구별부(120)는 인코더와 디코더로 구성된 순환신경망을 통해 계산된 스코어 점수에 기초하여 시계열 데이터에 대한 정상 구간, 주의 구간 및 특이 구간을 구별할 수 있다. 구별부(120)는 계산된 N-스코어가 계산된 A-스코어보다 작은 경우, 계산된 N-스코어 미만을 정상 구역으로 판단하고, 계산된 N-스코어 이상 계산된 A-스코어 미만인 경우, 주의 구역, 계산된 A-스코어 이상인 경우, 특이 구역으로 구분하고, 계산된 N-스코어가 상기 계산된 A-스코어보다 큰 경우, 계산된 A-스코어 미만을 정상 구역으로 판단하고, 계산된 A-스코어 이상 계산된 N-스코어 미만인 경우, 주의 구역, 계산된 N-스코어 이상인 경우, 특이 구역으로 구분할 수 있다. 구별부(120)는 특이점수의 분포(distribution) 중 꼬리(tail) 부분에서 기 설정된 기준 이상의 차이를 보이는 경계점을 기준으로 시계열 데이터에서 정상 데이터와 특이 데이터를 구별하는 경계 데이터를 탐색하고, 탐색된 경계 데이터를 이용하여 시계열 데이터의 정상 상태, 주의 상태 및 특이 상태를 포함하는 이상상황을 탐지할 수 있다. 구체적으로, 구별부(120)는 기 설정된 기준 이상의 특이점수를 갖는 데이터를 제거한 데이터를 이용하여 다시 특이점수의 분포의 꼬리 부분에서 가장 큰 차이 혹은 가장 큰 비율의 차이를 보이는 경계 데이터를 찾는 동작을 반복적으로 계산하여 최종 경계 데이터를 탐색할 수 있다. In step 220, the distinguishing unit 120 may distinguish a normal section, a caution section, and a specific section for the time series data based on the score calculated through a recurrent neural network composed of an encoder and a decoder. If the calculated N-score is smaller than the calculated A-score, the discrimination unit 120 determines that less than the calculated N-score is a normal area, and if the calculated N-score or more is less than the calculated A-score, the caution area , if it is more than the calculated A-score, it is classified as a special area, and if the calculated N-score is greater than the calculated A-score, it is determined that the area below the calculated A-score is a normal area, and the calculated A-score or more If it is less than the calculated N-score, it can be classified as a caution zone, and if it is more than the calculated N-score, it can be classified as a special zone. The discrimination unit 120 searches for boundary data that distinguishes normal data from singular data in time series data based on a boundary point showing a difference greater than a predetermined standard in the tail part of the distribution of singular scores, It is possible to detect abnormal situations including the normal state, attention state, and unusual state of time series data using boundary data. Specifically, the discriminating unit 120 performs an operation of finding boundary data showing the largest difference or the largest percentage difference in the tail of the distribution of singular scores again using data from which data having singular scores equal to or greater than a predetermined criterion has been removed. The final boundary data can be explored by iterative calculation.

단계(230)에서 입력부(130)는 실시간으로 발생하는 새로운 시계열 데이터를 훈련된 인코더와 디코더로 구성된 순환신경망에 입력할 수 있다. In step 230, the input unit 130 may input new time-series data generated in real time to a recurrent neural network composed of a trained encoder and decoder.

단계(240)에서 판단부(140)는 훈련된 인코더와 디코더로 구성된 순환신경망을 이용하여 실시간으로 발생하는 새로운 시계열 데이터로부터 이상상황 발생 여부를 판단할 수 있다. 판단부(140)는 훈련된 인코더-디코더로 구성된 순환신경망을 이용하여 새로운 시계열 데이터에 대한 특이점수를 계산하고, 계산된 특이점수를 이용하여 정상 데이터, 주의 데이터 또는 특이 데이터를 포함하는 이상상황 발생 여부를 판단할 수 있다. In step 240, the determination unit 140 may determine whether an abnormal situation has occurred from new time-series data generated in real time using a recurrent neural network composed of a trained encoder and decoder. The determination unit 140 calculates a singular score for new time-series data using a recurrent neural network composed of trained encoder-decoders, and generates an abnormal situation including normal data, caution data, or singular data using the calculated singular score. can determine whether

도 3은 탐지 시스템에서 시계열 데이터의 이상상황을 탐지하는 동작을 설명하기 위한 도면이다.3 is a diagram for explaining an operation of detecting an abnormal situation of time series data in a detection system.

시계열 데이터로부터 소수의 이상상황을 나타내는 시계열 데이터를 찾기 위하여, 시계열 데이터가 입력 데이터로서 입력될 수 있다. 이때의 레이블을 입력에 사용하였던 시계열 데이터로 동일하게 사용하는 순환신경망 기반의 인코더-디코더를 구성하여 훈련시킬 수 있다. 이러한 훈련과정에서 대다수인 정상 데이터를 복원시키도록 딥러닝 파라미터가 훈련되므로, 정상 데이터는 오차를 나타내는 비용함수의 값이 작게 나타나고, 이상상황을 포함하는 특이 데이터는 상대적으로 큰 비용함수를 갖게 된다. In order to find time series data representing a small number of anomalies from time series data, time series data may be input as input data. A circular neural network-based encoder-decoder that uses the same label as the time-series data used in the input can be configured and trained. Since the deep learning parameters are trained to restore the majority of normal data in this training process, the normal data has a small value of the cost function representing the error, and the specific data including the abnormal situation has a relatively large cost function.

도 4를 참고하면, 인코더-디코더로 구성된 순환신경망을 설명하기 위한 도면이다. 인코더(420)에 입력되는 시계열

과 디코더(410)에 출력되는 시계열 데이터

를 이용하여 훈련될 수 있다. 여기에서, 인코더(420)에 입력되는 시계열과 디코더(410)에 출력되는 시계열이 동일하게 되도록, 다시 말해서,

에

을 대입하여 훈련시키는 방법을 사용할 수 있다. 이러한 기술은 주어지는 시계열 데이터의 형태에 따라 다음과 같이 구별하여 적용시킬 수 있다. 또한, 데이터의 형태에 따라 데이터의 변화율을 도출하고, 도출된 변화율을 사용할 수도 있다.Referring to FIG. 4, it is a diagram for explaining a recurrent neural network composed of an encoder-decoder. Time series input to encoder 420

and time series data output to the decoder 410

can be trained using Here, the time series input to the encoder 420 and the time series output to the decoder 410 are the same, in other words,

to

You can use the training method by substituting . These techniques can be applied by distinguishing them as follows according to the type of given time series data. In addition, the rate of change of data may be derived according to the type of data, and the derived rate of change may be used.

도 3을 참고하면, 시계열 데이터(301)가 입력 데이터로서 입력될 수 있다. 이때, 시계열 데이터를 기 설정된 크기의 윈도우(window)를 이동하며 동일한 사이즈가 되도록 전처리를 수행하고, 전처리가 수행된 시계열 데이터로부터 정상 데이터와 특이 데이터를 구별할 수 있는 레이블이 있는 경우와, 정상 데이터와 특이 데이터를 구할 수 있는 레이블이나 다른 정보가 없는 경우로 분류하여 훈련을 수행할 수 있다. 탐지 시스템은 시계열 데이터에 레이블이 존재하는지 여부를 판단할 수 있다(302). 탐지 시스템은 시계열 데이터에 레이블이 존재하지 않을 경우, 순환신경망 기반의 인코더-디코더를 이용하여 시계열 데이터에 대한 특이점수를 계산할 수 있다(303). 순환신경망 기반의 특이점수를 위한 훈련은 뉴럴 네트워크의 노드들이 속하는 가중치(weight)를 조정하는 것을 포함하고, 훈련을 마친 노드들의 가중치들은 새로운 시계열 데이터를 정상 상태, 주의 상태, 특이 상태로 구별하기 위한 특이점수 계산의 바탕이 될 수 있다. 이때, 계산된 특이점수 중 높은 특이점수를 갖는 데이터(HD)를 구성할 수 있고, 높은 특이점수를 갖는 데이터(HD)에 대한 특이점수와 높은 특이점수를 갖는 특이점수의 최소값인 A-스코어를 계산할 수 있다(304). 그리고 나서, 높은 특이점수를 갖는 데이터를 제거한 시계열 데이터에 대한 특징점수와 높은 특이점수를 갖는 데이터를 제거한 시계열 데이터(LD)의 특이점수의 최대값인 N-스코어를 계산할 수 있다(305). 이와 같이, 계산된 A-스코어 및 N-스코어에 기초하여 정상 데이터, 주의 데이터 및 특이 데이터를 구별할 수 있다(314). Referring to FIG. 3 , time series data 301 may be input as input data. At this time, preprocessing is performed to move the time series data to the same size by moving a window of a predetermined size, and when there is a label capable of distinguishing normal data from unusual data from the preprocessed time series data, and normal data Training can be performed by classifying it as a case where there is no label or other information that can obtain and singular data. The detection system may determine whether a label exists in the time series data (302). When a label does not exist in the time series data, the detection system may calculate a singular score for the time series data using a recurrent neural network-based encoder-decoder (303). Training for the singular score based on the recurrent neural network includes adjusting the weights to which the nodes of the neural network belong, and the weights of the trained nodes are used to classify new time series data into a normal state, an attention state, and a singular state. This can be the basis for calculating the singularity score. At this time, data (HD) having a high singular score among the calculated singular scores can be constructed, and the singular score for the data (HD) having a high singular score and the A-score, which is the minimum value of the singular score having a high singular score, are selected. can be computed (304). Then, the N-score, which is the maximum value of feature scores for time series data from which data with high outliers are removed, and singular scores of time series data (LD) from which data with high outliers are removed, can be calculated (305). In this way, normal data, cautionary data, and specific data may be distinguished based on the calculated A-score and N-score (314).

탐지 시스템은 레이블이 존재할 경우, 순환신경망 기반의 인코더-디코더를 이용하여 정상 데이터를 훈련시킬 수 있다(311). 탐지 시스템은 정상 데이터에 대한 N-스코어를 계산(312)할 수 있고, 특이 데이터를 이용하여 A-스코어를 계산할 수 있다(313). 이와 같이, 계산된 N-스코어 및 A-스코어에 기초하여 시계열 데이터에 대한 정상 데이터, 주의 데이터 및 특이 데이터를 구별할 수 있다. When a label exists, the detection system may train normal data using a recurrent neural network-based encoder-decoder (311). The detection system may calculate 312 an N-score for normal data and calculate 313 an A-score using unusual data. In this way, based on the calculated N-score and A-score, it is possible to distinguish normal data, cautionary data, and idiosyncratic data for time series data.

예를 들면, 기계의 공정과 같이 시계열 데이터가 정해진 기간 동안 반복적으로 발생하는 경우에 대하여 설명하기로 한다. 훈련 데이터가 n개의 시계열 T_i(i=1, 2, 3, ..., n)로 이루어져 있고, 각각의 시계열 데이터 T_i는 m개의 시점의 데이터로 구성되어 있다고 하면, 즉, T_i=(t₁, t₂, ..., t_m)과 같은 형태로 주어져 있으며, 각각의 T_i는 동일한 공정이나 유사한 상황에서 반복적으로 발생하는 데이터인 경우, 기본적으로 데이터 표준화를 거친 후 다음과 같은 동작을 진행할 수 있다.For example, a case in which time-series data repeatedly occurs during a predetermined period, such as a process of a machine, will be described. If the training data consists of n time series T _i (i=1, 2, 3, ..., n), and each time series data T _i consists of data at m time points, that is, T _i = It is given in the form of (t ₁ , t ₂ , ..., t _m ), and when each T _i is data that occurs repeatedly in the same process or similar situation, basically after data standardization, the following action can proceed.

각각의 T_i가 이상상황을 포함하고 있는지 아닌지 알고 있는 경우에 대하여 설명하기로 한다. 이상항황을 포함하고 있지 않은 데이터만을 이용하여 RNN기반의 인코더-디코더의 입력 시계열 데이터와 출력 시계열 데이터가 동일하게 되도록 훈련시킬 수 있다. 이때, 손실(loss) 함수로는 딥러닝에서 사용되는 일반적인 손실 함수를 사용할 수 있으며, 훈련이 끝나서 모든 파라미터가 고정된 경우, 새로운 입력 데이터

에 대한 손실값

를

의 특이점수라고 한다. 이 특이점수는 훈련 상황에 따라 달라질 수 있다. The case where it is known whether or not each T _i contains an abnormal situation will be described. It is possible to train the RNN-based encoder-decoder so that the input time series data and the output time series data are the same using only data that does not contain anomalies. At this time, a general loss function used in deep learning can be used as the loss function, and when all parameters are fixed after training, new input data

loss value for

cast

is called the singular point of This singular score may vary depending on the training situation.

정상 데이터에 대한 특이점수의 최대값을 N-스코어라고 하고, N-스코어의 값을 M, 이상상황을 포함하는 데이터(특이 데이터)의 특징점수의 최소값을 A-스코어라고 하고, A-스코어의 값을 m이라고 기재하기로 한다. 그러면,

인 경우, 스코어 구간

을 정상 구간,

을 주의 구간,

을 특이 구간이라고 구분하고, 새로운 시계열 데이터의 스코어가 포함되는 구간에 따라 각각 정상 상태, 주의 상태, 특이 상태로 판정할 수 있다. 그리고,

인 경우, 스코어 구간

을 정상 구간,

을 주의 구간,

을 특이 구간으로 구분하고, 새로운 시계열 데이터의 스코어가 포함되는 구간에 따라 각각 정상 상태, 주의 상태, 특이 상태로 판정할 수 있다. The maximum value of the singularity score for normal data is called the N-score, the value of the N-score is M, the minimum value of the feature score of the data (singular data) including the abnormal situation is called the A-score, Let the value be described as m. then,

If , the score interval

to the normal interval,

the interval of attention,

is classified as a singular section, and may be determined as a normal state, a state of caution, and a singular state, respectively, according to the section including the score of the new time series data. and,

If , the score interval

to the normal interval,

the interval of attention,

is classified as a singular section, and according to the section including the score of the new time series data, it can be determined as a normal state, a state of caution, and a singular state, respectively.

다른 예로서, 각각의 Ti가 이상상황을 포함하고 있는지 모르는 경우에 대하여 설명하기로 한다. 전체 데이터를 이용하여 순환신경망 기반의 인코더-디코더의 입력 시계열 데이터와 출력 시계열 데이터가 동일하게 되도록 훈련시킨 후, 각각의 훈련 데이터

에 대한 손실값

를 획득할 수 있다. 이때, 손실값

를 내림차순으로 정렬시킨 후, 각 항들의 차이 혹은 비율의 최댓값이 기 설정된 기준을 초과할 경우, 이 최댓값을 구성하는 항을 이상상황의 경계로 삼는다. 구체적으로, 표준화된 데이터가 n개의 시계열 T_i(i=1, 2, 3, ..., n)로 이루어져 있고, 각각의 손실을

라고 하고, 이를 내림차순으로 정리한 것을

(i=1, 2, 3, ..., n)라고 하자.

(비율인 경우는

)라고 하고,

,

라고 하면

가 기 설정된 값(미리 정해놓은 값) C보다 큰 경우, A-스코어는

, N-스코어는

이 된다. 이때 스코어 구간

을 정상 구간,

를 주의 구간,

를 특이 구간이라고 구분하고, 새로운 시계열 데이터의 스코어가 포함되는 구간에 따라 각각 정상 상태, 주의 상태, 특이 상태로 판정할 수 있다. 이때,

가 기 설정된 값 C보다 작은 경우는 특이상태가 없는 것으로 판정하며, 추후 데이터가 추가적으로 수집되거나 C의 값을 조정함에 따라 새롭게 판정할 수 있다. 또한 C를 결정하기 위해서는 경험치나 전문가의 식견을 활용할 수도 있다. As another example, a case where it is not known whether each Ti includes an abnormal situation will be described. After training so that the input time series data and the output time series data of the recurrent neural network-based encoder-decoder are the same using the entire data, each training data

loss value for

can be obtained. At this time, the loss value

After arranging in descending order, if the maximum value of the difference or ratio of each term exceeds the preset standard, the term constituting the maximum value is taken as the boundary of the abnormal situation. Specifically, the standardized data consists of n time series T _i (i=1, 2, 3, ..., n), and each loss

, and arrange them in descending order

Let (i=1, 2, 3, ..., n).

(In case of ratio

) and

,

If you say

If is greater than the preset value (predetermined value) C , the A-score is

, the N-score is

becomes At this point, the score range

to the normal interval,

the interval of attention,

is classified as a singular section, and may be determined as a normal state, a state of caution, and a singular state, respectively, according to the section including the score of the new time series data. At this time,

If is less than the preset value C , it is determined that there is no singular state, and a new determination can be made as additional data is collected later or the value of C is adjusted. In addition, experience points or expert insights can be used to determine C.

다른 예로서, 주식 가격과 같은 동일한 상태에서의 데이터가 반복되지 않는 경우에 대하여 설명하기로 한다. 시계열 데이터가 하나의 시퀀스

로 주어져 있고, 원하는 시계열 데이터의 길이를 m이라고 하면, 다음과 같이 앞과 뒤의 시계열 데이터가 일부 중첩되도록 시계열 데이터 T_i를 정할 수 있다. As another example, a case in which data in the same state, such as a stock price, is not repeated will be described. Time series data is a sequence

, and if the length of the desired time series data is m, the time series data T _i can be determined so that the front and back time series data partially overlap as follows.

이 후, 앞에서 설명한 것과 같이, 데이터가 정해진 기간 동안 반복적으로 발생하는 경우와 동일하게 동작될 수 있다. Thereafter, as described above, the operation may be performed in the same manner as in the case where data is repeatedly generated for a predetermined period.

탐지 시스템은 실시간 이상상황을 탐지할 수 있다. 실시간으로 입력되는 데이터에 대한 확장은 데이터가 정해진 기간 동안 반복적으로 발생하는 경우와, 주식 가격과 같이 동일한 상태의 데이터가 반복되지 않는 경우 각각에 대하여 다르게 수행될 수 있다. 데이터가 정해진 기간 동안 반복적으로 발생하는 경우는 현재까지 관찰된 데이터와 기존의 이상상황을 포함하고 있지 않은 데이터 셋, 혹은 현재까지 관찰된 데이터와 LD에 속한 데이터를 결합한 후, 결합한 시계열 데이터의 스코어를 계산하여 정상 상태, 주의 상태, 특이 상태 혹은 각각의 상황일 확률을 제공할 수 있다. 동일한 상태의 데이터가 반복되지 않는 경우는, 현재의 데이터를 포함한 최근의 m개의 데이터를 이용하여 스코어를 계산할 수 있다. The detection system can detect abnormal situations in real time. Expansion of data input in real time may be performed differently for each case in which the data repeatedly occurs for a predetermined period and in the case where data in the same state, such as a stock price, is not repeated. If data occurs repeatedly during a fixed period, the data observed so far and a data set that does not contain an existing anomaly, or the data observed so far and the data belonging to the LD are combined, and then the combined time series data is scored. It can be calculated to provide the probability of a normal state, a state of attention, a singular state, or each situation. If the data in the same state is not repeated, the score can be calculated using m pieces of recent data including the current data.

일례로, 기계의 공정과 같이 데이터가 정해진 기간 동안 반복적으로 발생하는 경우에 대하여 설명하기로 한다. 데이터가 l번째 시점까지 T=(r₁, r₂, ..., r_l,)이 실시간으로 발생했고, 이상상황을 포함하고 있지 않은 데이터 셋 또는 LD가 n개의 시계열 T_i(i=1, 2, 3, ..., n)로 이루어져 있고, 각각의 시계열 T_i는 m개 시점의 데이터로 구성되어 있다고 하면, 즉, T_i=(t₁, t₂, ..., t_m)과 같은 형태로 주어질 수 있으며, 각각의 T_i는 동일한 공정이나 상황에서 발생하는 반복 데이터인 경우, 다음과 같이 결합 데이터를 형성할 수 있다. 다시 말해서,

를 구성하고,

의 스코어가 정상 상태, 주의 상태, 특이 상태에 속한 경우의 수를 각각 a, b, c라고 하면, 정상 상태, 주의 상태, 특이 상태에 속해있는 확률은 각각 a/n, b/n, c/n이 된다. As an example, a case in which data is repeatedly generated during a predetermined period, such as a process of a machine, will be described. T=(r ₁ , r ₂ , ..., r _l ,) occurred in real time until the lth point in time, and the data set or LD that does not contain anomalies is n time series T _i (i=1 , 2, 3, ..., n), and each time series T _i is composed of data at m points in time, that is, T _i = (t ₁ , t ₂ , ..., t _m ), and when each T _i is repetitive data occurring in the same process or situation, combined data can be formed as follows. In other words,

to configure,

If the number of cases in which the score belongs to the normal state, attention state, and singular state is a, b, and c, respectively, the probability of belonging to the normal state, attention state, and singular state is a/n, b/n, and c/, respectively. becomes n.

다른 예로서, 주식 가격과 같이 동일한 데이터가 반복되지 않는 경우, 현재의 데이터를

라고 하면 이전의 m-1개의 데이터를 포함한 시계열 데이터

를 구성하고, 이 시계열 데이터의 스코어를 이용하여 정상 상태, 주의 상태, 특이 상태를 판정할 수 있다.As another example, if the same data is not repeated, such as stock prices, the current data

, time series data including the previous m-1 data

is constituted, and a normal state, a state of caution, and a singular state can be determined using the score of this time series data.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. Devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable PLU (programmable logic unit). logic unit), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

In the abnormal situation detection method performed by the detection system,
inputting first time-series data generated in real time to a recurrent neural network composed of a trained encoder and decoder; and
Determining whether an abnormal situation has occurred from the first time-series data using a recurrent neural network composed of the trained encoder and decoder
including,
The recurrent neural network composed of the trained encoder and decoder,
Learning to distinguish a normal section, an attention section, and a singular section for a singular score determined based on a loss value of the second time series data for training,
The determining step is
Using the recurrent neural network composed of the trained encoder and decoder, the corresponding data is selected among the normal state, the attention state, and the singular state according to which section among the normal section, the attention section, and the singular section is included in the singular score of the first time series data. to decide which one
An abnormal situation detection method characterized by.

According to claim 1,
The recurrent neural network composed of the trained encoder and decoder,
Determining whether a label exists in the second time series data, and if the label does not exist in the second time series data, identifying normal data and unusual data in the second time series data (identification process) to perform, calculate a singular score for each data in the second time series data, and distinguish normal data from singular data once or sequentially using the calculated singular score.
An abnormal situation detection method characterized by.

According to claim 2,
The recurrent neural network composed of the trained encoder and decoder,
A data set (HD) having a singular score equal to or higher than a preset criterion is constructed from among the second time series data, and a data set (LD) obtained by removing data having a singular score equal to or higher than the preset criterion and a loss function are used to obtain the Training a recurrent neural network composed of an encoder and a decoder, and through the training, singular scores for the data set (LD) from which data having singular scores equal to or higher than the predetermined criterion are removed and data having singular scores equal to or greater than the predetermined criterion are removed Calculate the N-score, which is the maximum value of the singular score for the set (LD), and through the training, the singular score for the data set (HD) having a singular score equal to or higher than the preset criterion and the singular score equal to or greater than the preset criterion Calculate the A-score, which is the minimum value of the singular score for the data set with
If the calculated N-score is less than the calculated A-score, less than the calculated N-score is determined as a normal zone, and if the calculated N-score or more is less than the calculated A-score, a caution zone, If it is more than the calculated A-score, it is classified as a specific area, and if the calculated N-score is greater than the calculated A-score, it is determined that less than the calculated A-score is a normal area, and the calculated A-score is determined as a normal area. -Categorize into a caution zone if the score is less than the calculated N-score, or a special zone if the calculated N-score is higher, and the difference of the distribution of the singular scores is greater than or equal to a predetermined standard in the tail part Searching for boundary data that distinguishes normal data and unusual data in the second time series data based on a boundary point showing Implemented to detect anomalies
An abnormal situation detection method characterized by.

According to claim 2,
The recurrent neural network composed of the trained encoder and decoder,
When a label exists in the second time series data, normal data and unusual data are classified from the second time series data, an N-score is calculated for the classified normal data through the training, and the classified unusual data is If the calculated N-score is smaller than the calculated A-score, the area less than the calculated N-score is determined as a normal zone, and the calculated N-score or more is determined as the normal area. If it is less than A-score, dividing into a caution zone, if it is more than the calculated A-score, into a special zone, if the calculated N-score is greater than the calculated A-score, less than the calculated A-score It is determined as a normal zone, and if it is greater than the calculated A-score and less than the calculated N-score, it is implemented to classify it as a caution zone, and if it is greater than or equal to the calculated N-score, as a special zone
An abnormal situation detection method characterized by.

delete

In a program stored in a computer readable storage medium to execute an abnormal situation detection method performed by a detection system,
inputting first time-series data generated in real time to a recurrent neural network composed of a trained encoder and decoder; and
Determining whether an abnormal situation has occurred from the first time-series data using a recurrent neural network composed of the trained encoder and decoder
including,
The recurrent neural network composed of the trained encoder and decoder,
Learning to distinguish a normal section, an attention section, and a singular section for a singular score determined based on a loss value of the second time series data for training,
The determining step is
Using the recurrent neural network composed of the trained encoder and decoder, the corresponding data is selected among the normal state, the attention state, and the singular state according to which section among the normal section, the attention section, and the singular section is included in the singular score of the first time series data. to decide which one
A program stored in a computer readable storage medium characterized by a.

According to claim 6,
The recurrent neural network composed of the trained encoder and decoder,
Determining whether a label exists in the second time series data, and if the label does not exist in the second time series data, identifying normal data and unusual data in the second time series data (identification process) and constructing a data set (HD) having a singular score equal to or greater than a preset criterion among the second time series data, and removing data having a singular score equal to or greater than the preset criterion. A data set (LD) and a loss function The recurrent neural network composed of the encoder and the decoder is trained using , and the singular score for the data set (LD) from which data having a singular score equal to or higher than the predetermined criterion is removed through the training and a singular score having a singular score equal to or greater than the predetermined criterion An N-score, which is the maximum value of the singular score for the data set (LD) from which data is removed, is calculated, and the singular score for the data set (HD) having a singular score equal to or higher than the preset standard through the training, and the preset Calculate an A-score, which is the minimum value of the singular score, for a data set having a singular score higher than a criterion, and if a label exists in the second time series data, classify normal data and outlier data from the second time series data through the training And, it is possible to calculate the N-score for the classified normal data, and calculate the A-score using the classified specific data,
If the calculated N-score is less than the calculated A-score, less than the calculated A-score is determined as a normal zone, and if the calculated N-score or more is less than the calculated A-score, a caution zone, If it is more than the calculated A-score, it is classified as a specific area, and if the calculated N-score is greater than the calculated A-score, it is determined that less than the calculated A-score is a normal area, and the calculated A-score is determined as a normal area. -If the score is greater than or less than the calculated N-score, it is implemented to classify into a caution zone, and if it is greater than or equal to the calculated N-score, into a special zone
A program stored in a computer readable storage medium characterized by a.

According to claim 6,
The recurrent neural network composed of the trained encoder and decoder,
Boundary data for distinguishing normal data from singular data is searched for in the second time series data based on a boundary point having a difference greater than or equal to a preset standard in a tail portion of the distribution of singular scores, and the searched boundary is searched for. Implemented to detect an abnormal situation including a normal state, an attention state, and an unusual state of the first time series data using data
A program stored in a computer readable storage medium characterized by a.

In the detection system for detecting an abnormal situation,
an input unit for inputting first time-series data generated in real time to a recurrent neural network composed of a trained encoder and decoder; and
Determination unit for determining whether an abnormal situation has occurred from the first time-series data using a recurrent neural network composed of the trained encoder and decoder
including,
The recurrent neural network composed of the trained encoder and decoder,
Learning to distinguish a normal section, an attention section, and a singular section for a singular score determined based on a loss value of the second time series data for training,
The judge,
Using the recurrent neural network composed of the trained encoder and decoder, the corresponding data is selected among the normal state, the attention state, and the singular state according to which section among the normal section, the attention section, and the singular section is included in the singular score of the first time series data. to decide which one
Characterized by a detection system.

According to claim 9,
The recurrent neural network composed of the trained encoder and decoder,
Determining whether a label exists in the second time series data, and if the label does not exist in the second time series data, identifying normal data and unusual data in the second time series data (identification process) performs, calculates a singular score for each data in the second time series data, distinguishes normal data from singular data once or sequentially using the calculated singular score, and sets a predetermined criterion among the second time series data. Constructing a data set (HD) having a singular score higher than or equal to the predetermined standard, and using a data set (LD) from which data having a singular score higher than the predetermined standard is removed and a loss function Train a recurrent neural network composed of the encoder and decoder and, through the training, the singular score for the data set (LD) from which data having a singular score equal to or higher than the predetermined criterion was removed and the singular score for the data set (LD) from which data having a singular score greater than or equal to the set criterion were removed through the training The N-score, which is the maximum value, is calculated, and through the training, a singular score for a data set (HD) having a singular score equal to or higher than the preset criterion and a singular score for a data set having a singular score equal to or higher than the preset criterion are calculated. Calculate an A-score, which is the minimum value, and if a label exists in the second time series data, classify normal data and unusual data from the second time series data through the training, and obtain an N-score for the classified normal data. and calculating an A-score using the classified specific data,
If the calculated N-score is less than the calculated A-score, less than the calculated A-score is determined as a normal area, and if the calculated N-score or more is less than the calculated A-score, a caution area, If it is more than the calculated A-score, it is classified as a specific area, and if the calculated N-score is greater than the calculated A-score, it is determined that less than the calculated A-score is a normal area, and the calculated A-score is determined as a normal area. -If the score is greater than or equal to the calculated N-score, it is implemented to classify into a caution zone, and if it is greater than or equal to the calculated N-score, into a special zone
Characterized by a detection system.

According to claim 9,
The recurrent neural network composed of the trained encoder and decoder,
Boundary data for distinguishing normal data from singular data is searched for in the second time series data based on a boundary point having a difference greater than or equal to a preset standard in a tail portion of the distribution of singular scores, and the searched boundary is searched for. Implemented to detect an abnormal situation including a normal state, an attention state, and an unusual state of the first time series data using data
Characterized by a detection system.