KR102310490B1

KR102310490B1 - The design of GRU-based cell structure robust to missing value and noise of time-series data in recurrent neural network

Info

Publication number: KR102310490B1
Application number: KR1020180048801A
Authority: KR
Inventors: 오혜연; 박성준; 박정국
Original assignee: 한국과학기술원
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2021-10-08
Also published as: KR20190124846A; WO2019208998A1

Abstract

예측하고자 하는 문제에 맞추어 시계열 데이터의 결측값 대치 및 노이즈 완화가 동시에 가능한 재귀적 인공 신경망 모델을 제공하는 것으로서, (a) 시계열 데이터에서 학습가능한 노이즈 완화 필터를 이용한 가중평균 방법으로 노이즈를 완화하는 단계, (b) 결측값을 대치하는 단계, (c) GRU연산을 통해 현재 시점에서 기억해야하는 정보를 잠재 상태 벡터에 저장하는 단계를 단일 셀 구조에 모두 포함하는 것을 특징으로 한다. 또한, 본 발명은 재귀적 인공 신경망 모델을 구성함에 있어서, 상기 (a) 단계에서, 셀 구조 내에 포함된 노이즈 완화를 위한 가중치 파라미터가 재귀적 인공 신경망 모델을 예측 과제에 적합하도록 학습하는 과정에서, 과제에 최적화되도록 학습되는 것을 특징으로 한다. 상기와 같은 방법에 의하여, 별도의 전처리 없이 시계열 데이터의 결측값 대치 및 노이즈 완화를 동시에 수행하는 재귀적 인공 신경망 모델을 활용해, 다양한 기계학습 과제에 활용할 수 있다.To provide a recursive artificial neural network model capable of simultaneously imputing missing values of time series data and mitigating noise according to the problem to be predicted, (a) mitigating noise by a weighted average method using a noise mitigation filter that can be learned from time series data , (b) replacing missing values, and (c) storing information that needs to be memorized at the current point in time through GRU operation in a latent state vector are all included in a single cell structure. In addition, in the present invention, in constructing a recursive neural network model, in the step (a), in the process of learning the recursive neural network model so that the weight parameter for noise mitigation included in the cell structure is suitable for the prediction task, It is characterized in that it is learned to be optimized for the task. By the above method, a recursive artificial neural network model that simultaneously performs replacement of missing values and noise mitigation of time series data without separate preprocessing can be utilized for various machine learning tasks.

Description

{ The design of GRU-based cell structure robust to missing value and noise of time-series data in recurrent neural network }

본 발명은 재귀적 신경망 (Recurrent Neural Network) 모델을 설계할 때 사용되는 셀 (Cell) 의 내부 구조에 가중치 평균 필터 (Weighted Average Filter) 를 추가한 뒤, 신경망 모델을 학습시킬 때 필터의 매개 변수를 함께 학습하여, 결측치 및 노이즈가 있는 시계열 입력 데이터를 학습하여 예측 과제를 수행할 수 있는 GRU (Gated Recurrent Unit) 기반의 새로운 재귀적 신경망 모델에 관한 것이다.The present invention adds a Weighted Average Filter to the internal structure of a cell used when designing a recurrent neural network model, and then sets the parameters of the filter when training the neural network model. It relates to a new recursive neural network model based on GRU (Gated Recurrent Unit) that can learn together and perform a prediction task by learning time series input data with missing values and noise.

일반적으로, 시계열 데이터를 신경망 모델 등의 분류기에 사용할 때 데이터에 포함된 데이터 누락 및 노이즈는 전처리 과정에서 처리되어 사용된다. 전처리 과정에서 누락 된 데이터는 전체 평균값이나 가중치 평균값, 또는 선형 회귀법이나 서포트 벡터 머신 기반 회귀법 등의 알고리즘을 사용하여 대치된다. 또한 데이터의 노이즈는 이동 평균 필터, 웨이브렛 필터, 퍼지 로직 등의 방법을 통해 완화된다.In general, when time series data is used in a classifier such as a neural network model, data omissions and noise included in the data are processed and used in the preprocessing process. In the preprocessing process, the missing data is replaced using an algorithm such as the overall mean value or the weighted mean value, or a linear regression method or a support vector machine-based regression method. In addition, data noise is mitigated through methods such as a moving average filter, a wavelet filter, and a fuzzy logic.

그러나 이러한 데이터 누락 및 노이즈 처리 기법은 신경망 모델의 목적 시스템과는 무관하게 적용된다는 한계가 존재한다. 따라서 신경망 모델이 입력 데이터를 통해 학습될 때 이미 전처리를 마친 데이터는 수정되지 않으며, 이 때문에 신경망 모델의 구조나 목적 시스템의 특성 등이 데이터 누락 및 노이즈 처리 과정에 효과적으로 반영될 수 없게 된다.However, there is a limitation that these data omission and noise processing techniques are applied regardless of the target system of the neural network model. Therefore, when the neural network model is learned from the input data, the data that has already been pre-processed is not modified, and for this reason, the structure of the neural network model or the characteristics of the target system cannot be effectively reflected in the data omission and noise processing process.

상기와 같은 데이터 누락 및 노이즈 처리 방법의 한계를 해결하고자 신경망 모델 내부에서 셀 구조를 변형하여 누락 된 데이터를 대치하고 노이즈를 완화하는 접근 방법이 제시되고 있다. 셀의 구조를 통해 데이터 전처리를 시행할 경우 신경망 모델의 학습 과정에서 전처리에 사용된 함수의 매개변수가 같이 학습될 수 있다는 장점이 있다. 그러나 아직까지 셀 구조의 누락 데이터의 대치 및 노이즈 완화를 동시에 수행하는 셀 구조는 제시된 바가 없으며, 목적 시스템의 정확도 향상에 대한 개선의 여지가 남아있는 상태이다.In order to solve the limitations of the data omission and noise processing methods as described above, an approach for replacing missing data and mitigating noise by modifying the cell structure inside the neural network model has been proposed. When data preprocessing is performed through the cell structure, there is an advantage that the parameters of the function used for preprocessing can be learned together in the training process of the neural network model. However, a cell structure that simultaneously performs replacement of missing data and noise mitigation of the cell structure has not been proposed, and there is still room for improvement in improving the accuracy of the target system.

KRUS 10201601026901020160102690 AA KRUS 10201800076571020180007657 AA

Che, Zhengping, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. "Recurrent neural networks for multivariate time series with missing values." Scientific reports 8, no. 1 (2018): 6085.Che, Zhengping, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. "Recurrent neural networks for multivariate time series with missing values." Scientific reports 8, no. 1 (2018): 6085.

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 재귀적 신경망 모델을 설계할 때 가중치 평균 필터가 내장된 GRU 기반의 셀을 사용하여, 신경망을 학습할 때 각 셀에 존재하는 필터의 매개 변수를 함께 학습하여, 데이터 누락 및 노이즈가 존재하는 시계열 데이터에 대해 별도의 전처리 없이 재귀적 신경망 모델의 학습을 가능하게 하는 방법을 제공하는 것이다.An object of the present invention is to solve the above-described problems, and when designing a recursive neural network model, a GRU-based cell with a built-in weighted average filter is used, and when learning a neural network, a filter present in each cell is used. It is to provide a method that enables training of a recursive neural network model without separate preprocessing for time series data in which data omission and noise exist by learning parameters together.

특히, 본 발명에서 제시하는 셀 구조는 결측값에 대한 대치가 가능하고, 학습가능하고 유연한 가중치 평균 필터를 통한 노이즈의 완화를 동시에 수행하는 방법을 제공하여, 예측하고자 하는 문제에 맞추어 노이즈 완화 필터의 매개 변수를 함께 학습하는 학습 알고리즘을 제공하는 것이다.In particular, the cell structure proposed in the present invention provides a method for simultaneously performing noise mitigation through a weighted averaging filter that can replace missing values, is learnable and flexible, so that the noise mitigation filter can be used in accordance with the problem to be predicted. It is to provide a learning algorithm that learns parameters together.

상기 목적을 달성하기 위해 본 발명은 예측하고자 하는 문제에 맞추어 시계열 데이터의 결측값 대치 및 노이즈 완화가 동시에 가능한 재귀적 인공 신경망 모델을 제공하는 것으로서, (a) 시계열 데이터에서 학습가능한 노이즈 완화 필터를 이용한 가중평균 방법으로 노이즈를 완화하는 단계, (b) 결측값을 대치하는 단계, (c) GRU연산을 통해 현재 시점에서 기억해야하는 정보를 잠재 상태 벡터에 저장하는 단계를 단일 셀 구조에 모두 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides a recursive artificial neural network model capable of simultaneously replacing missing values and noise mitigation of time series data in accordance with a problem to be predicted, (a) using a noise mitigation filter that can be learned from time series data Including all of the steps of mitigating noise with a weighted average method, (b) replacing missing values, and (c) storing information that needs to be remembered at the current point in time through GRU operation in a latent state vector in a single cell structure characterized.

또, 본 발명은 재귀적 인공 신경망 모델을 구성함에 있어서, 상기 (a) 단계에서, 셀 구조 내에 포함된 노이즈 완화를 위한 가중치 파라미터가 재귀적 인공 신경망 모델을 예측 과제에 적합하도록 학습하는 과정에서, 과제에 최적화되도록 학습되는 것을 특징으로 한다.In addition, in the process of learning the recursive artificial neural network model so that, in the step (a), the weight parameter for noise mitigation included in the cell structure is suitable for the prediction task, It is characterized in that it is learned to be optimized for the task.

상술한 바와 같이, 본 발명에 따른 데이터 누락 및 노이즈 강건한 GRU 셀을 사용하여 재귀적 신경망을 구성할 경우, 데이터 누락 및 노이즈가 존재하는 시계열 데이터 대해 작동하는 임의의 재귀적 신경망의 목적 시스템에 대한 성능이 개선되는 효과가 얻어진다. As described above, when a recursive neural network is constructed using a GRU cell that is robust to data omission and noise according to the present invention, the performance of any recursive neural network operating on time-series data in which data omission and noise are present on the target system This improved effect is obtained.

도 1은 기존의 GRU 셀 구조에 대한 블록도.
도 2는 본 발명에서 제시하는 셀 구조에 대한 블록도.1 is a block diagram of a conventional GRU cell structure.
Figure 2 is a block diagram of the cell structure presented in the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다. 본 발명이 제시하는 셀 구조는 아래와 같음.Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings. The cell structure proposed by the present invention is as follows.

1. 길이가 T인 시계열 데이터의 각 시점 t는 N개의 특징을 나타내는 N차원 벡터로 구성되어 있으며, 결측값과 노이즈를 포함하고 있을 수 있음. 1. Each time point t of time series data of length T consists of N-dimensional vectors representing N features, and may contain missing values and noise.

2. 노이즈 제거 레이어를 이용해 노이즈를 제거함. 레이어가 산출하는 값은 시점 t를 기준으로 가장 최근에 관측된 입력값

으로, 노이즈가 제거된 값임.2. Remove noise using denoising layer. The value calculated by the layer is the most recently observed input value based on time t.

, which is a value from which noise has been removed.

2. 1. 시점 t의 N차원 벡터에서 d번째 차원의 값

가 관측되었다면, 이 값을 그대로 사용함. 반대로, 시점 t의 N차원 벡터에서 d번째 차원의 값

가 관측되지 않았다면, 해당 차원에서 가장 최근에 관측되었던 값

을 사용함.

은

가 관측되었는지 여부를 나타내며, 관측되었다면 1, 아니면 0의 값을 갖는 마스크임.2. 1. The d-dimensional value of the N-dimensional vector at time t

is observed, this value is used as it is. Conversely, the d-dimensional value of the N-dimensional vector at time t

If is not observed, the most recently observed value in that dimension

used.

silver

Indicates whether or not is observed, a mask with a value of 1 if observed and 0 otherwise.

2. 2.

는 시점 t-k부터 시점 t까지의

들의 가중 평균임.

는 가장 최근에 계산된 결측값이 대치된 k개의 시점별

의 가중 평균으로서,

는 k차원의 각 시점별 가중치를 나타내는 학습가능한 파라미터임.2. 2.

is from time tk to time t

is the weighted average of them.

is for each k time points where the most recently calculated missing values are imputed.

As the weighted average of

is a learnable parameter representing the weight for each time point in the k dimension.

즉, 노이즈 제거 레이어가 산출하는

는, 시점 t를 기준으로 가장 최근에 관측된 노이즈가 제거된

임.That is, the noise removal layer calculates

is the most recently observed noise with respect to time t.

Lim.

3. 앞선 레이어에서 산출한 노이즈가 제거된 값을 기반으로 결측값을 대치함. 레이어가 산출하는 값은 노이즈가 제거되고 결측값이 고려된

임. 자세한 과정을 수식으로 나타내면 아래와 같음.3. Replace missing values based on the noise-removed values calculated in the previous layer. The value calculated by the layer is calculated with noise removed and missing values considered.

Lim. The detailed process is expressed in formulas as follows.

3. 1. 시점 t의 N차원 벡터에서 d번째 차원의 값

가 관측되었다면, 노이즈가 제거된

값을 그대로 사용함.3. 1. The d-dimensional value of the N-dimensional vector at time t

is observed, the noise has been removed

Use the value as it is.

3. 2. 시점 t의 N차원 벡터에서 d번째 차원의 값

가 관측되지 않았다면, 감쇄율

를 적용한

를 사용함. 즉, 현재 시점 t로부터 마지막으로

가 관측되었던 시점까지 흐른 시간

에 비례하여, 지수적 감쇄(exponential decay)율을 적용함.3. 2. The d-dimensional value of the N-dimensional vector at time t

If is not observed, the decay rate

applied

is used. That is, from the current time t to the last

time elapsed until the point in time when was observed

In proportion to , an exponential decay rate is applied.

감쇄율

은

에 비례하여 증가하며,

와

는 감쇄율을 결정하기 위해서 데이터에 따라 학습될 수 있는 파라미터임. 감쇄율

은 0과 1사이의 값을 가질 수 있는데, 이 값이 1에 가까워질수록 가장 최근에 관측되었던 값

대신, 전체 평균 혹은 임의의 상수

를 설정하여, 모든 입력값은 일정 시간 이상 관측값이 제공되지 않으면

으로 수렴하도록 함.decay rate

silver

increases in proportion to

Wow

is a parameter that can be learned according to the data to determine the decay rate. decay rate

can have a value between 0 and 1, and as this value approaches 1, the most recently observed value

Instead, an overall average or an arbitrary constant

By setting , all input values are returned if no observations are provided for a certain period of time.

to converge to .

4. 결측값과 노이즈가 처리된 값

을 바탕으로 GRU 연산을 수행함. 4. Missing and noise-treated values

GRU operation is performed based on

시점 t의 입력값 벡터

는 리셋 게이트(reset gate)

와 업데이트 게이트(update gate)

를 계산하기 위해 사용됨. 각 게이트는 이전 시점의 잠재 상태(hidden state)

과 입력값

을 이용해 산출된 0과 1사이의 값으로써,

는 후보 잠재 상태(candidate hidden state)

를 계산할 때 이전 시점의 잠재 상태

를 얼마나 반영할지를 나타내며,

는 현재 시점의 잠재 상태

를 계산할 때 이전 시점의 잠재 상태

를 얼마나 반영할지를 나타냄. input vector at time t

is the reset gate

and update gate

used to calculate Each gate has a hidden state from the previous point in time.

and input

As a value between 0 and 1 calculated using

is a candidate hidden state.

the latent state at a previous point in time when calculating

indicates how much to reflect

is the current latent state

the latent state at a previous point in time when calculating

indicates how much to reflect.

5. 본 발명의 셀 구조에서 최종적으로 얻게 되는 값은 현재 시점의 잠재 상태

이다. 이는 이전 시점까지 처리한 정보

와 현재 시점의 원시 데이터(raw data)를 조합하여, 시계열 자료에서 과제 수행을 위해 이번 시점에 기억해야 하는 정보를 벡터로 표현한 것임.5. The value finally obtained in the cell structure of the present invention is the current latent state

am. This is information that has been processed up to a previous point in time.

and the raw data of the current time are combined to express the information that needs to be remembered at this time in order to perform the task in the time series data as a vector.

: 시점 t의 N차원 벡터에서 d번째 차원의 값.

: 시점 t를 기준으로, N차원 벡터에서 d번째 차원의 값 중에서 가장 최근에 관측된 값. 만약 시점 t-1에서 이 값이 마지막으로 관측되고 시점 t에서의 이 값(

)이 결측값이라면,

임. (감쇄율을 고려하지 않았을 경우)

: 시점 t의 N차원 벡터에서 d번째 차원의 값의 결측값 여부를 나타내는 마스크 값. 만약

가 결측값이 아니라면 1,

가 결측값이라면 0의 값을 가짐.

: 결측값이 발생했을 때, 가장 최근에 관측된 값

를 얼마나

에 반영할지 결정하는 감쇄율을 나타내며, 0과 1사이의 값을 가질 수 있음.

: 감쇄율을 결정하기 위한 입력값으로,

과

간의 시간차를 나타냄. 즉, 현재로부터 마지막으로 어떤 값이 관측된 시점까지의 거리를 나타냄. 예를 들어, 현재 시점이 t이고

가 t-1 시점에 관측되었다면

는 1임.

: 감쇄율을 결정하기 위해

에 곱해지는 학습가능한 파라미터.

: 감쇄율을 결정하기 위해

에 더해지는 학습가능한 파라미터.

: 노이즈 제거 레이어에서,

를 계산하기 위해 각

에 곱해지는 가중치.

: 각 시점별

에 가중치

를 곱하여 구해진, 결측값이 대치되고 노이즈가 제거된 시점 t의 N차원 벡터에서 d번째 차원의 값.

: 후보 잠재 상태. 이번 시점에 들어온 입력값을 활용하여, 이번 시점의 잠재 상태에 대한 후보를 생성함.

: 현재 시점의 잠재 상태. 지난 시점의 잠재 상태와 이번 시점의 후보 잠재 상태를 바탕으로 계산된, 현재 시점에서 기억해야 하는 정보가 벡터 표현으로 나타난 것.

: GRU 연산에서 사용되는 업데이트 게이트. 이전 시점의 잠재 상태(hidden state)

에 파라미터

를 내적하고, 입력값

에

를 내적한 다음

을 더한 값에 시그모이드 활성함수를 적용하여 산출된 0과 1사이의 값으로, 현재 시점의 잠재 상태

를 계산할 때 이전 시점의 잠재 상태

를 얼마나 반영할지를 나타냄.

: GRU 연산에서 사용되는 리셋 게이트. 이전 시점의 잠재 상태(hidden state)

에 파라미터

를 내적하고, 입력값

에

를 내적한 다음

을 더한 값에 시그모이드 활성함수를 적용하여 산출된 0과 1사이의 값으로, 후보 잠재 상태(candidate hidden state)

를 계산할 때 이전 시점의 잠재 상태

를 얼마나 반영할지를 나타냄.

: The d-dimensional value of the N-dimensional vector at time t.

: The most recently observed value among the d-dimensional values in the N-dimensional vector with respect to the time point t. If this value was last observed at time t-1 and this value at time t (

) is a missing value,

Lim. (If the attenuation rate is not taken into account)

: A mask value indicating whether the d-dimensional value is missing in the N-dimensional vector at time t. if

1 if is not a missing value,

has a value of 0 if is a missing value.

: When a missing value occurs, the most recently observed value

how much

It represents the rate of decay that determines whether to reflect in , and can have a value between 0 and 1.

: As an input value to determine the decay rate,

class

represents the time difference between them. That is, it represents the distance from the present time to the last time a certain value was observed. For example, if the current time point is t,

If is observed at time t-1

is 1.

: to determine the decay rate

A learnable parameter multiplied by .

: to determine the decay rate

A learnable parameter added to .

: In the noise removal layer,

each to calculate

weight multiplied by .

: for each time point

weighted on

The d-dimensional value in the N-dimensional vector at the time t at which missing values are replaced and noise is removed, obtained by multiplying by .

: Candidate potential status. By using the input values received at this time, a candidate for the latent state at this time is generated.

: The latent state of the present moment. Information that needs to be remembered at the present time, calculated based on the latent state of the past time and the candidate latent state of this time, is expressed as a vector expression.

: Update gate used in GRU operation. The hidden state at a previous point in time

parameter on

dot product, and input

to

then dot product

A value between 0 and 1 calculated by applying the sigmoid activation function to the value added by

the latent state at a previous point in time when calculating

indicates how much to reflect.

: Reset gate used in GRU operation. The hidden state at a previous point in time

parameter on

dot product, and input

to

then dot product

A value between 0 and 1 calculated by applying the sigmoid activation function to the added value of the candidate hidden state.

the latent state at a previous point in time when calculating

indicates how much to reflect.

Claims

In the cell structure of the recursive artificial neural network model,
Each time point t of time series data of length T consists of an N-dimensional vector containing missing values and noise,
(a) a layer that mitigates noise by a weighted average method using a noise reduction filter that can be learned from time series data;
(b) a layer for replacing missing values, and
(c) including a layer that stores information that needs to be remembered at the current point in the latent state vector through GRU operation,
The (a) layer is
The d-dimensional input of the N-dimensional vector at time t

is observed, the observed input value at time t

is used as is and the input value at time t

is not observed, the most recently observed input in the d dimension

use ,
The most recently observed input value at time t

Inputs for k time points with the most recently computed missing values imputed for

Input value with noise removed by weighted average of

to calculate,
The (b) layer is
Input value with noise removed

By imputing missing values based on
The d-dimensional input of the N-dimensional vector at time t

is observed, the noise-removed input value

Leave as is and input value at time t

If is not observed, the input value with attenuation applied

use ,
The (c) layer is
Missing and noise-processed inputs

GRU operation is performed based on
Current latent state according to GRU operation

, the latent state at a previous point in time

Expressing the information that needs to be remembered at the current point in time as a latent state vector by combining the raw data at the current point in time.
Cell structure of a recursive artificial neural network model characterized by