KR20230039902A

KR20230039902A - Method and apparatus for dynamic ultra-high-dimensional feature selection for time series prediction and causal factor analysis

Info

Publication number: KR20230039902A
Application number: KR1020210122826A
Authority: KR
Inventors: 성시현; 전윤호; 유진선
Original assignee: 주식회사 모플
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2023-03-22

Abstract

Disclosed are a method for selecting a dynamic ultra-high-dimensional feature for time series prediction and cause factor analysis and a device thereof. The method for selecting the dynamic ultra-high-dimensional feature performed by a computer device according to one embodiment comprises a step of selecting and learning a cause factor that is helpful in predicting a prediction value in the time series prediction, wherein the cause factor may comprise at least one or more among the meta-data directly connected to the prediction value, the time-series input data, and the input covariates. Therefore, the present invention is capable of reducing an input dimension.

Description

Dynamic hyper-dimensional feature selection method and apparatus for time series prediction and causal factor analysis

아래의 실시예들은 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법 및 그 장치에 관한 것이다. The following embodiments relate to a dynamic high-dimensional feature selection method and apparatus for time series prediction and causal factor analysis.

시계열 데이터(time-series data)는 일정 기간에 대해 시간의 함수로 표현되는 데이터를 가리킨다. 이러한 시계열 데이터는 과거의 시계열 데이터에 대한 분석을 통하여 예측될 수 있다.Time-series data refers to data expressed as a function of time over a period of time. Such time series data can be predicted through analysis of past time series data.

시계열 데이터를 예측함에 있어서, 과거의 트레이닝 기간 동안 발생된 시계열 데이터를 유사한 패턴의 시계열 데이터끼리 클러스터링 하고, 예측 기간에 대응하는 클러스터를 결정함으로써, 상기 예측 기간의 시계열 데이터가 상기 결정된 클러스터와 유사한 패턴을 보일 것으로 예측하는 방법이 제공된다.In predicting time series data, by clustering time series data generated during the past training period with time series data of similar patterns and determining a cluster corresponding to the prediction period, the time series data of the prediction period has a pattern similar to the determined cluster. A method of predicting what will be seen is provided.

예컨대, 시간 t에 따라 x의 값이 변하는 x(t) 꼴의 값들을 시계열 데이터라고 하며, 다음과 같이 나타낼 수 있다.For example, values of the form x(t) in which the value of x changes with time t are called time series data, and can be expressed as follows.

X = [x(0), x(1), x(2), ??]X = [x(0), x(1), x(2), ??]

현재 시점이 T라고 했을 때, x(t) where t <= T 까지를 관찰한 상태에서, x(t) where t > T를 예측하는 것을 시계열 예측(time-series forecasting)이라고 한다Assuming that the current point in time is T, forecasting x(t) where t > T, with observations up to x(t) where t <= T, is called time-series forecasting.

미래를 예측하기 위해서는, 결과가 발생하는 원인에 대한 정보가 충분히 모델링 되어야 한다. 원인인자가 입력되지 않은 상태에서 미래 예측의 불확실성을 줄이는 것은 불가능하다. 따라서, 예측하고자 하는 현상의 원인(원인인자)이 될 수 있는 여러 종류의 시계열 데이터(time-series data), 메타데이터(meta-data), 공변량(covariates) 등을 입력해야 한다. 여기서 공변량(covariates)은 미래를 알고 있는 값이다. 예컨대 공휴일은 미래에 언제가 공휴일일지 알 수 있다.In order to predict the future, information about the cause of the effect must be sufficiently modeled. It is impossible to reduce the uncertainty of future predictions without inputting causal factors. Therefore, it is necessary to input various types of time-series data, meta-data, and covariates that can be the cause (causative factor) of the phenomenon to be predicted. Here, covariates are values for which the future is known. For example, on a public holiday, you can know when it will be a public holiday in the future.

예를 들어, 특정 패션 브랜드의 패션 아이템 미래 수요를 예측하기 위해서, 해당 패션 브랜드에서 판매되고 있는 모든 상품에 대한 정보를 입력하려고 했을 때, 실제로 판매 중인 상품 수 600,000개 x 판매 채널 수 2,000개 x 판매 일 수 14일 = 56억 개 수준의 입력 차원이 만들어진다. 이 정도 규모의 입력 특징(input feature)의 joint relationship을 한번에 모델링 할 수 있는 방법은 존재하지 않는다.For example, in order to predict the future demand for fashion items of a certain fashion brand, when trying to enter information on all products sold by that fashion brand, the number of products actually on sale is 600,000 x the number of sales channels is 2,000 x sales 14 days = 5.6 billion levels of input dimensions are created. There is no way to model the joint relationships of input features on this scale all at once.

한국등록특허 10-2134682호는 이러한 실시간 시계열 데이터를 위한 예측 모형 생성 시스템 및 방법에 관한 기술을 기재하고 있다.Korean Patent Registration No. 10-2134682 describes a system and method for generating a predictive model for such real-time time series data.

한국등록특허 10-2134682호Korean Patent Registration No. 10-2134682

실시예들은 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법 및 그 장치에 관하여 기술하며, 보다 구체적으로 시계열 예측에서 풀고자 하는 문제를 예측하는데 도움이 되는 정보들을 골라내는 학습 기반의 기술을 제공한다. Embodiments describe a dynamic ultra-high-dimensional feature selection method and apparatus for time series prediction and causal factor analysis, and more specifically, a learning-based technology that selects information useful for predicting a problem to be solved in time series prediction. to provide.

실시예들은 다수개의 입력공간 특징들의 조합 및 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 입력공간 특징을 선택함으로써, 풀고자 하는 문제에 대한 정보의 손실을 최소화하면서 입력 차원을 줄일 수 있는 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법 및 그 장치를 제공하는데 있다. Embodiments learn a combination of a plurality of input spatial features and a metric about how effective the combination of input spatial features is in predicting a predicted value, thereby providing an effective input for predicting each predicted value in a data-driven manner. An object of the present invention is to provide a dynamic super-high-dimensional feature selection method and apparatus for time series prediction and causal factor analysis capable of reducing input dimensions while minimizing loss of information about a problem to be solved by selecting spatial features.

일 실시예에 따른 컴퓨터 장치에 의해 수행되는 동적 초고차원 특징 선택 방법은, 시계열 예측에서 예측 값을 예측하는데 도움이 되는 원인인자를 선택하여 학습하는 단계를 포함하고, 상기 원인인자는, 상기 예측 값과 직접적으로 연결된 메타데이터(meta-data), 시계열 입력 데이터 및 입력 공변량(covariates) 중 적어도 어느 하나 이상을 포함할 수 있다. A dynamic ultra-high-dimensional feature selection method performed by a computer device according to an embodiment includes the step of selecting and learning a causal factor that is helpful in predicting a predicted value in time series prediction, wherein the causal factor is the predicted value. It may include at least one or more of metadata directly connected to (meta-data), time-series input data, and input covariates.

상기 원인인자를 선택하여 학습 후, 시계열 예측 모델을 통해 최종 예측 값을 출력하는 단계를 더 포함할 수 있다. The method may further include selecting and learning the causal factor and outputting a final prediction value through a time series prediction model.

상기 원인인자를 선택하여 학습하는 단계는, 알고리즘을 통해, 다수개의 입력공간 특징들의 조합 및 상기 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 상기 입력공간 특징을 선택하는 단계를 포함할 수 있다. In the step of selecting and learning the causal factor, a combination of a plurality of input spatial features and a metric for how effective the combination of the input spatial features is in predicting a prediction value are learned through an algorithm to achieve data driven (data driven) -driven) to select an effective feature of the input space for prediction of each prediction value.

상기 원인인자를 선택하여 학습하는 단계는, 입력공간 특징들의 조합을 제안하는 에이전트를 구성하는 단계; 상기 에이전트가 상기 입력공간 특징들의 조합 및 상기 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하는 단계; 및 학습이 진행됨에 따라, 상기 메타데이터를 기반으로 주어진 인스턴스(instance) 예측 값을 예측하는데 도움이 되는 상기 에이전트가 상기 입력공간 특징을 선택하는 단계를 포함할 수 있다. The selecting and learning of the causal factor may include constructing an agent that proposes a combination of input spatial features; learning, by the agent, the combination of input spatial features and a metric about how effective the combination of input spatial features is in predicting a prediction value; and selecting, by the agent, a feature of the input space that is helpful in predicting a given instance prediction value based on the metadata as learning progresses.

상기 원인인자를 선택하여 학습하는 단계는, 상기 입력공간 특징 내의 요소(element) 개수에 제한을 두지 않는 경우, 디코더(decoder)를 구성하는 단계를 포함할 수 있다. The step of selecting and learning the causal factor may include constructing a decoder when the number of elements in the input spatial feature is not limited.

상기 원인인자를 선택하여 학습하는 단계는, 상기 입력공간 특징 내의 요소(element) 개수에 제한을 둘 경우, 임의의 특징 인스턴스 제안(feature instance proposal)이 가능한 모듈을 사용하여 해당 개수만큼 특징 인스턴스(feature instance)를 출력하도록 하는 단계를 포함할 수 있다. In the step of selecting and learning the causal factor, if the number of elements in the input space feature is limited, a module capable of making arbitrary feature instance proposals is used, and the number of feature instances as many as the corresponding number is selected. instance) may be included.

상기 원인인자를 선택하여 학습하는 단계는, 강화학습(reinforcement learning)을 기반으로 할 경우, 예측하고자 하는 인스턴스(instance)를 고정시키고 손실(loss)을 최적화할 수 있다. In the step of selecting and learning the causal factor, if based on reinforcement learning, an instance to be predicted may be fixed and a loss may be optimized.

상기 시계열 예측 모델을 통해 최종 예측 값을 출력하는 단계는, 상기 원인인자를 선택하여 학습 시 도출된 값을 기반으로 상기 시계열 예측 모델의 손실(loss)을 적용하고, 엔드 투 엔드(end-to-end)로 학습할 수 있다. In the step of outputting a final prediction value through the time series prediction model, the loss of the time series prediction model is applied based on the value derived during learning by selecting the causal factor, and end-to-end end) can be learned.

상기 시계열 예측 모델을 통해 최종 예측 값을 출력하는 단계는, 상기 입력공간 특징 내의 요소(element) 개수에 제한을 두지 않는 경우, 추가적으로 인코더(encoder)를 구성하는 단계를 포함할 수 있다. Outputting the final prediction value through the time-series prediction model may include additionally configuring an encoder when the number of elements in the input spatial feature is not limited.

상기 시계열 예측 모델을 통해 최종 예측 값을 출력하는 단계는, 매 예측 인스턴스(instance)마다 특징을 바꾸게 되면 입력 매니폴드(input manifold)의 형태에 일관성이 없으므로, 특징 선택(feature selection) 후 인스턴스(instance) 간 매니폴드(manifold)의 일관성이 생길 수 있도록 변환하는 단계를 포함할 수 있다. In the step of outputting the final predicted value through the time series prediction model, since the shape of the input manifold is not consistent when the feature is changed for every predicted instance, after feature selection, the instance ) may include a step of converting so that the consistency of the manifold between them can occur.

다른 실시예에 따른 동적 초고차원 특징 선택 장치는, 시계열 예측에서 예측 값을 예측하는데 도움이 되는 원인인자를 선택하여 학습하는 특징 선택부를 포함하고, 상기 원인인자는, 상기 예측 값과 직접적으로 연결된 메타데이터(meta-data), 시계열 입력 데이터 및 입력 공변량(covariates) 중 적어도 어느 하나 이상을 포함할 수 있다. An apparatus for dynamic high-dimensional feature selection according to another embodiment includes a feature selection unit that selects and learns a causal factor that is helpful in predicting a predicted value in time series prediction, wherein the causal factor is a meta directly connected to the predicted value. It may include at least one or more of data (meta-data), time-series input data, and input covariates.

상기 원인인자를 선택하여 학습 후, 시계열 예측 모델을 통해 최종 예측 값을 출력하는 시계열 예측 모델링부를 더 포함할 수 있다. After learning by selecting the causal factor, a time series prediction modeling unit outputting a final prediction value through a time series prediction model may be further included.

상기 특징 선택부는, 알고리즘을 통해, 다수개의 입력공간 특징들의 조합 및 상기 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 상기 입력공간 특징을 선택할 수 있다. The feature selector learns, through an algorithm, a combination of a plurality of input spatial features and a metric about how effective the combination of the input spatial features is in predicting a predicted value, and makes data-driven every prediction. It is possible to select the valid input spatial feature for value prediction.

상기 특징 선택부는, 입력공간 특징들의 조합을 제안하는 에이전트를 구성하고, 상기 에이전트가 상기 입력공간 특징들의 조합 및 상기 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하며, 학습이 진행됨에 따라, 상기 메타데이터를 기반으로 주어진 인스턴스(instance) 예측 값을 예측하는데 도움이 되는 상기 에이전트가 상기 입력공간 특징을 선택할 수 있다. The feature selector configures an agent that proposes a combination of input spatial features, and the agent learns the combination of input spatial features and a metric about how effective the combination of input spatial features is in predicting a predicted value. And, as learning progresses, the agent that helps predict a given instance prediction value based on the metadata can select the input space feature.

상기 특징 선택부는, 상기 입력공간 특징 내의 요소(element) 개수에 제한을 두지 않는 경우, 디코더(decoder)를 구성할 수 있다.The feature selector may configure a decoder when the number of elements in the input space feature is not limited.

실시예들에 따르면 다수개의 입력공간 특징들의 조합 및 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 입력공간 특징을 선택함으로써, 풀고자 하는 문제에 대한 정보의 손실을 최소화하면서 입력 차원을 줄일 수 있는 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법 및 그 장치를 제공할 수 있다. According to the embodiments, a combination of a plurality of input spatial features and a metric for how effective a combination of input spatial features is in predicting a predicted value are learned to predict each predicted value in a data-driven manner. By selecting valid input space features, it is possible to provide a dynamic hyper-high-dimensional feature selection method and device for time series prediction and causal factor analysis that can reduce the input dimension while minimizing the loss of information about the problem to be solved.

도 1은 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법을 개략적으로 나타내는 도면이다.
도 2는 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 장치를 나타내는 블록도이다.1 is a diagram schematically illustrating a dynamic ultrahigh-dimensional feature selection method for time series prediction and causal factor analysis according to an embodiment.
2 is a flowchart illustrating a dynamic ultrahigh-dimensional feature selection method for time series prediction and causal factor analysis according to an embodiment.
3 is a block diagram illustrating a dynamic high-dimensional feature selection device for time series prediction and causal factor analysis according to an embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in many different forms, and the scope of the present invention is not limited by the embodiments described below. In addition, several embodiments are provided to more completely explain the present invention to those skilled in the art. The shapes and sizes of elements in the drawings may be exaggerated for clarity.

일반적으로 입력 데이터의 차원을 줄이는 방법으로는 차원 축소(dimensionality reduction) 및 특징 선택(feature selection)이 있다. In general, methods for reducing the dimensionality of input data include dimensionality reduction and feature selection.

차원 축소는 고차원 데이터를 저차원으로 프로젝션(projection)하는 방법이다. 예를 들어 PCA는 감독되지 않은 기능 선택(unsupervised feature selection)이므로, 타겟 데이터(target data)를 보지 않는다. Dimensionality reduction is a method of projecting high-dimensional data into low-dimensional data. For example, PCA is unsupervised feature selection, so it does not look at the target data.

그리고, 특징 선택은 일부 데이터들만 골라내는 방법이다. 예를 들어 상호 정보(mutual information)는 greedy한 특징 평가는 용이하나(X vs Y 두 변수의 관계), joint relationship(X1, X2, …, Xn vs Y)에 대해 평가하기 어렵다. 즉, 변수가 커지면 계산 불가능한 수준의 복잡도를 갖는 문제점이 있다.Also, feature selection is a method of selecting only some data. For example, mutual information is easy to evaluate for greedy features (relationship between two variables X vs Y), but difficult to evaluate for joint relationships (X1, X2, …, Xn vs Y). That is, when the variable becomes large, there is a problem of having a level of complexity that is impossible to calculate.

아래의 실시예들은 풀고자 하는 문제에 대한 정보의 손실을 최소화하면서 입력 데이터의 차원을 줄이고자 한다.The following embodiments attempt to reduce the dimensionality of input data while minimizing loss of information about a problem to be solved.

풀고자 하는 문제에 대한 정보의 손실을 최소화하면서 입력 차원을 줄이기 위해서는, 먼저 풀고자 하는 문제가 무엇인지를 정확히 사용해야 한다(supervised). 그리고, 매 추론마다 필요한 정보를 골라서 쓸 수 있어야 한다. 또한, greedy relationship(bivariate) 뿐만 아니라, joint relationship을 볼 수 있어야 한다. 그리고, 연속변수(continuous variable)와 이산변수(discrete variable) 모두에 사용될 수 있어야 한다. 마지막으로, 계산 가능한 수준의 복잡도여야 한다.In order to reduce the input dimensionality while minimizing the loss of information about the problem to be solved, the exact problem to be solved must first be used (supervised). And, for each reasoning, you must be able to select and write the necessary information. Also, you should be able to see joint relationships as well as greedy relationships (bivariate). And, it should be able to be used for both continuous and discrete variables. Finally, it must be of computable complexity.

또한, 모든 예측은 그렇게 예측한 원인에 대해서 설명 가능해야 하는데, 위의 "매 추론마다 필요한 정보를 골라서 쓸 수 있어야 한다."가 달성 가능해지면 자연스럽게 추론에 필요한 정보를 골라냈으므로, 추론의 원인(원인인자)이 설명 가능함이 달성된다.In addition, all predictions must be explainable for the cause of the prediction, and when the above "must be able to select and write the necessary information for each inference" is achievable, the information necessary for inference was naturally selected, so the cause of the inference ( causal factor) is achieved.

본 실시예에서는 풀고자 하는 문제를 예측하는데 도움이 되는 정보들을 골라내는 학습 기반의 기법을 제안한다. In this embodiment, we propose a learning-based technique for selecting information that is helpful in predicting a problem to be solved.

도 1은 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법을 개략적으로 나타내는 도면이다.1 is a diagram schematically illustrating a dynamic ultrahigh-dimensional feature selection method for time series prediction and causal factor analysis according to an embodiment.

도 1을 참조하면, 예컨대, 시계열 예측 문제에서 풀고자 하는 문제는, 예측하고자 하는 변수

가 될 수 있다. 입력 차원이 커지면 분석적(analytic)으로 joint relationship을 찾는 것이 어려워지므로, 실시예에서는 경험적(empirical) 방법을 최대한 활용할 수 있다.Referring to FIG. 1, for example, the problem to be solved in the time series prediction problem is the variable to be predicted.

can be Since it becomes difficult to find a joint relationship analytically when the input dimension is large, in the embodiment, an empirical method can be maximally utilized.

먼저, 일 실시예에 따르면 다음과 같이 실험 셋팅을 할 수 있다.First, according to an embodiment, an experiment setting may be performed as follows.

예측할 값을 다음 식과 같이 나타낼 수 있다.The predicted value can be expressed as the following equation.

[수학식 1][Equation 1]

여기서, 예측 시점은

이고, 예측 구간 길이는

(

터

지 예측)이다.Here, the prediction time is

, and the prediction interval length is

(

place

prediction).

예측할 값

와 직접적으로 연결된 메타데이터(meta data)를

으로 표현할 수 있다. 여기서,

를 명백하게(explicit) 설명할 수 있는 직접적으로 연관된 데이터이다.value to predict

Meta data directly linked to

can be expressed as here,

is directly related data that can explicitly describe

에 속하지 않는 입력 데이터는

where

이고,

에 속하지 않는 입력 공변량(covariates

where

,

이다.

Input data that does not belong to

where

ego,

Input covariates that do not belong to

where

,

am.

위의 입력 데이터와 공변량(covariates)을 포함하는 데이터 pool을 각각

고 한다.Each of the data pools containing the above input data and covariates

say

그리고,

이다.and,

am.

여기서,

는 입력공간 특징(feature)들의 조합이고,

가

를 예측하는데 얼마나 유효한지에 대한 메트릭(metric)이다.here,

is a combination of input space features,

go

It is a metric for how effective it is in predicting .

최종 예측은 다음 식과 같이 나타낼 수 있다.The final prediction can be expressed as:

[수학식 2][Equation 2]

일 실시예에 따르면 최종 목표는 다음과 같이 나타낼 수 있다.According to one embodiment, the final goal can be expressed as follows.

수많은

) 쌍을 관찰한 후, 이를 학습하여 데이터 드리븐(data-driven)으로 매

예측을 위한 유효한

를 골라내는 알고리즘

을 제공할 수 있으며, 다음과 같이 나타낼 수 있다. Countless

) pairs are observed, and then learned and matched in a data-driven manner.

valid for prediction

Algorithm to pick out

can be provided, and can be expressed as:

[수학식 3][Equation 3]

여기서, 유효함의 조건은,

where

은 예측의 정확도를 측정할 수 있는 손실 함수(loss function)이다. 이를 통해 최선의 예측 정확도를 달성하는 것이 목표이다.Here, the condition of validity is,

where

is a loss function that can measure the accuracy of prediction. The goal is to achieve the best prediction accuracy through this.

세부 단계별 목표는 1) 알고리즘

를 구성하는 방법, 2)

의 구성을 제공하는 것이다.The detailed goals for each step are 1) Algorithm

How to configure, 2)

is to provide the configuration of

아래에서는 알고리즘

를 구성하는 방법에 대해 설명한다.Algorithm below

Describe how to configure.

알고리즘

를 구성하는 방법은 딥러닝 기반 학습을 통한 접근 방식으로 제공될 수 있다.algorithm

A method of configuring may be provided as an approach through deep learning-based learning.

먼저,

를 제안하는 에이전트(agent)를 구성한다. first,

configures an agent that proposes

해당 에이전트가

쌍들을 지속적으로 학습한다. the agent

Pairs are continuously learned.

학습이 진행됨에 따라, 메타데이터

을 기반으로 주어진 인스턴스(instance)

를 예측하는데 도움이 되는

를 골라낼 수 있어야 한다.As training progresses, metadata

given instance based on

to help predict

should be able to pick out

이 과정에서

내의 요소(element) 개수에 제한을 두는 방법과 제한을 두지 않는 방법이 있다.in this process

There are methods that limit the number of elements in the element, and methods that do not limit the number of elements.

의 요소(element) 개수에 제한을 두지 않기 위해서는 디코더(decoder)가 필요하다. 현재 메타데이터를 기반으로, 임의의 개수만큼 요소(element)를 제안할 수 있는 디코더를 구성한다.

In order not to limit the number of elements of , a decoder is required. Based on the current metadata, a decoder capable of proposing an arbitrary number of elements is constructed.

의 요소(element) 개수에 제한을 둘 경우, 임의의 특징 인스턴스 제안(feature instance proposal)이 가능한 모듈을 써서 해당 개수만큼 특징 인스턴스(feature instance)를 출력하도록 한다.

If the number of elements of is limited, a module capable of random feature instance proposals is used to output as many feature instances as the corresponding number.

여기서, 강화학습(reinforcement learning)을 기반으로 할 경우, 예측하고자 하는 인스턴스(instance)를 고정해두고, 그 안에서 손실(loss)을 최적화 할 수 있는 에이전트를 학습시킬 수 있다. 즉, 환경이 고정될 수 있다. Here, when based on reinforcement learning, the instance to be predicted is fixed, and an agent capable of optimizing the loss can be trained within it. That is, the environment can be fixed.

학습은 손실(loss)을 최소화(minimization) 할 수 있는 어떠한 기법도 사용이 가능하다.For learning, any technique capable of minimizing loss can be used.

한편, 알고리즘

를 구성하는 방법은 학습 기반이 아닌 접근 방식으로 제공될 수도 있다. On the other hand, the algorithm

The method of constructing may be provided as a non-learning-based approach.

greedy한 특징 선택(feature selection) 점수가 높은 특징들을 우선 중심으로 하여 탐색의 범위를 확장해 나간다. 즉, 의미 없는 특징들을 한번 제거해놓고 시작할 수 있다.In greedy feature selection, the range of search is expanded by focusing on features with high scores first. In other words, you can start by removing meaningless features once.

아래에서는

의 구성에 대해 설명한다. below

The configuration of is described.

의 구성은 딥러닝 기반 학습을 통한 접근 방식으로 제공될 수 있다.

The configuration of can be provided by an approach through deep learning-based learning.

먼저, 시계열 예측 모델(time-series forecasting model)을 구성하여, 해당 시계열 예측 모델의 손실(loss)을

로 사용한다. 이 경우 전체 과정이 엔드 투 엔드(end-to-end)로 학습될 수 있다.First, by constructing a time-series forecasting model, the loss of the time-series forecasting model

use as In this case, the entire process can be learned end-to-end.

앞에서

의 요소(element) 개수에 제한을 두지 않은 경우는 추가적으로 인코더(encoder)가 필요하다.in front

If the number of elements of is not limited, an additional encoder is required.

제안(proposal) 시에 각 요소(element)가 순열(permutation)이 되었을 때, 일반적인 딥러닝 모델들은 위치 불변(position-invariant)하지 않으므로 문제가 될 수 있다. 따라서, 시계열 예측 모델

의 아키텍처(architecture)는 위치와 관계없는 모델을 적용해야 한다. 예컨대, 각 key, value 간의 attention 기반 모델을 적용할 수 있다.

When each element is permuted at the time of proposal, it can be a problem because general deep learning models are not position-invariant. Therefore, the time series forecasting model

The architecture of should apply a position-independent model. For example, an attention-based model between each key and value can be applied.

매 예측 인스턴스(instance)마다 특징을 바꾸게 되면 입력 매니폴드(input manifold)의 형태에 일관성이 없으므로, 일반적인 딥러닝 모델들이 매니폴드(manifold)를 언폴드(unfold) 할 수 없게 된다. 이를 위해, 특징 선택(feature selection) 후 인스턴스(instance) 간 매니폴드(manifold)의 일관성이 생길 수 있도록 변환(transformation)하는 기능을 추가해야 할 수 있다.If the feature is changed at every prediction instance, the shape of the input manifold is inconsistent, so general deep learning models cannot unfold the manifold. To this end, it may be necessary to add a function that transforms after feature selection so that manifold consistency between instances can occur.

한편,

의 구성은 학습 기반이 아닌 방식으로도 접근 가능하다.Meanwhile,

The composition of is also accessible in a non-learning-based way.

학습 기반이 아닌 경우,

와 값이 유사한 그룹과

와 값이 유사하지 않은 그룹 간의 차이가 커지도록

를 구성한다. If not learning-based,

groups with values similar to

so that the difference between groups with dissimilar values of

make up

도 2는 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법을 나타내는 흐름도이다.2 is a flowchart illustrating a dynamic ultrahigh-dimensional feature selection method for time series prediction and causal factor analysis according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 컴퓨터 장치에 의해 수행되는 동적 초고차원 특징 선택 방법은, 시계열 예측에서 예측 값을 예측하는데 도움이 되는 원인인자를 선택하여 학습하는 단계(S110)를 포함하여 이루어질 수 있다. 여기서, 원인인자는 예측 값과 직접적으로 연결된 메타데이터(meta-data), 시계열 입력 데이터 및 입력 공변량(covariates) 중 적어도 어느 하나 이상을 포함할 수 있다. Referring to FIG. 2 , a method for selecting a dynamic ultra-high-dimensional feature performed by a computer device according to an embodiment includes selecting and learning a causal factor helpful for predicting a predicted value in time series prediction (S110). It can be done. Here, the causal factor may include at least one or more of metadata directly connected to the predicted value, time-series input data, and input covariates.

또한, 원인인자를 선택하여 학습 후, 시계열 예측 모델을 통해 최종 예측 값을 출력하는 단계(S120)를 더 포함할 수 있다. In addition, after learning by selecting a causal factor, outputting a final predicted value through a time series prediction model (S120) may be further included.

일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 방법은 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 장치를 예를 들어 보다 구체적으로 설명할 수 있다. The dynamic ultra-high dimensional feature selection method for time series prediction and causal factor analysis according to an embodiment may be described in more detail by taking the dynamic ultra-high dimensional feature selection device for time series prediction and causal factor analysis according to an embodiment as an example. .

도 3은 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 장치를 나타내는 블록도이다.3 is a block diagram illustrating a dynamic high-dimensional feature selection device for time series prediction and causal factor analysis according to an embodiment.

도 3을 참조하면, 일 실시예에 따른 시계열 예측 및 원인인자 분석을 위한 동적 초고차원 특징 선택 장치(300)는 특징 선택부(310) 및 시계열 예측 모델링부(320)를 포함할 수 있다. Referring to FIG. 3 , an apparatus 300 for dynamic high-dimensional feature selection for time series prediction and causal factor analysis according to an embodiment may include a feature selection unit 310 and a time series prediction modeling unit 320 .

단계(S110)에서, 특징 선택부(310)는 시계열 예측에서 예측 값을 예측하는데 도움이 되는 원인인자를 선택하여 학습할 수 있다. 이 때, 딥러닝 방법을 통해 학습할 수 있다. 여기서, 원인인자는 예측 값과 직접적으로 연결된 메타데이터(meta-data), 시계열 입력 데이터 및 입력 공변량(covariates) 중 적어도 어느 하나 이상을 포함할 수 있다. In step S110, the feature selector 310 may select and learn a causal factor that is helpful in predicting a predicted value in time series prediction. At this time, it can be learned through a deep learning method. Here, the causal factor may include at least one or more of metadata directly connected to the predicted value, time-series input data, and input covariates.

특징 선택부(310)는 알고리즘을 통해, 다수개의 입력공간 특징들의 조합 및 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 입력공간 특징을 선택할 수 있다. 여기서, 알고리즘은 상술한 알고리즘

를 의미할 수 있다.The feature selector 310 learns a combination of a plurality of input spatial features and a metric about how effective the combination of input spatial features is in predicting a prediction value through an algorithm, and then performs data-driven It is possible to select valid input space features for prediction of predicted values. Here, the algorithm is the above-mentioned algorithm

can mean

보다 구체적으로, 특징 선택부(310)는 입력공간 특징들의 조합

을 제안하는 에이전트를 구성하고, 에이전트가 입력공간 특징들의 조합

및 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)

을 학습할 수 있다. 즉, 에이전트가

) 쌍들을 지속적으로 학습할 수 있다. 특징 선택부(310)는 학습이 진행됨에 따라, 메타데이터

를 기반으로 주어진 인스턴스(instance) 예측 값

을 예측하는데 도움이 되는 에이전트가 입력공간 특징을 선택할 수 있다. More specifically, the feature selection unit 310 is a combination of input space features

Construct an agent that proposes, and the agent is a combination of input space features

and a metric of how effective the combination of input space features is in predicting the predicted value.

can learn That is, the agent

) Pairs can be continuously learned. As learning progresses, the feature selection unit 310 provides metadata

predicted value for a given instance based on

An agent that helps predict the can select input space features.

여기서, 특징 선택부(310)는 입력공간 특징

내의 요소(element) 개수에 제한을 두지 않는 경우, 디코더(decoder)를 구성할 수 있다. 또한, 특징 선택부(310)는 입력공간 특징

내의 요소(element) 개수에 제한을 둘 경우, 임의의 특징 인스턴스 제안(feature instance proposal)이 가능한 모듈을 사용하여 해당 개수만큼 특징 인스턴스(feature instance)를 출력하도록 할 수 있다. Here, the feature selection unit 310 is a feature of the input space.

If there is no limit on the number of elements within, a decoder can be configured. In addition, the feature selection unit 310 is a feature of the input space

When the number of elements within the block is limited, a module capable of making arbitrary feature instance proposals can be used to output as many feature instances as the corresponding number.

한편, 특징 선택부(310)는 강화학습(reinforcement learning)을 기반으로 할 경우, 예측하고자 하는 인스턴스(instance)를 고정시키고 손실(loss)을 최적화할 수 있다. On the other hand, if the feature selector 310 is based on reinforcement learning, it is possible to fix an instance to be predicted and optimize a loss.

한편, 알고리즘

를 구성하는 방법은 학습 기반이 아닌 접근 방식으로 제공될 수도 있다. greedy한 특징 선택(feature selection) 점수가 높은 특징들을 우선 중심으로 하여 탐색의 범위를 확장해 나간다. 즉, 의미 없는 특징들을 한번 제거해놓고 시작할 수 있다.On the other hand, the algorithm

The method of constructing may be provided as a non-learning-based approach. In greedy feature selection, the range of search is expanded by focusing on features with high scores first. In other words, you can start by removing meaningless features once.

단계(S120)에서, 시계열 예측 모델링부(320)는 원인인자를 선택하여 학습 후, 시계열 예측 모델을 통해 최종 예측 값을 출력할 수 있다. In step S120, the time series prediction modeling unit 320 selects and learns a causal factor, and then outputs a final predicted value through a time series prediction model.

시계열 예측 모델링부(320)는 원인인자를 선택하여 학습 시 도출된 값을 기반으로 시계열 예측 모델의 손실(loss)을 적용하고, 엔드 투 엔드(end-to-end)로 학습할 수 있다. 즉,

를 구성할 수 있으며, 딥러닝 기반 학습을 통한 접근 방식으로 구성할 수 있다. The time series prediction modeling unit 320 selects a causal factor, applies a loss of a time series prediction model based on a value derived during learning, and can learn end-to-end. in other words,

can be configured, and it can be configured with an approach through deep learning-based learning.

여기서, 시계열 예측 모델링부(320)는 입력공간 특징 내의 요소(element) 개수에 제한을 두지 않는 경우, 추가적으로 인코더(encoder)를 구성할 수 있다. Here, the time series prediction modeling unit 320 may additionally configure an encoder when the number of elements in the input spatial feature is not limited.

시계열 예측 모델링부(320)는 매 예측 인스턴스(instance)마다 특징을 바꾸게 되면 입력 매니폴드(input manifold)의 형태에 일관성이 없으므로, 특징 선택(feature selection) 후 인스턴스(instance) 간 매니폴드(manifold)의 일관성이 생길 수 있도록 변환할 수 있다. Since the time series prediction modeling unit 320 is not consistent in the shape of the input manifold when the feature is changed for every prediction instance, the manifold between instances after feature selection can be transformed so that the consistency of

한편,

의 구성은 학습 기반이 아닌 방식으로도 접근 가능하다. 학습 기반이 아닌 경우,

와 값이 유사한 그룹과

와 값이 유사하지 않은 그룹 간의 차이가 커지도록

를 구성한다. Meanwhile,

The composition of is also accessible in a non-learning-based way. If not learning-based,

groups with values similar to

so that the difference between groups with dissimilar values of

make up

실시예들에 따르면 다수개의 입력공간 특징들의 조합 및 입력공간 특징들의 조합이 예측 값을 예측하는데 얼마나 유효한지에 대한 메트릭(metric)을 학습하여 데이터 드리븐(data-driven)으로 매 예측 값의 예측을 위한 유효한 입력공간 특징을 선택함으로써, 풀고자 하는 문제에 대한 정보의 손실을 최소화하면서 입력 차원을 줄일 수 있다.According to the embodiments, a combination of a plurality of input spatial features and a metric for how effective a combination of input spatial features is in predicting a predicted value are learned to predict each predicted value in a data-driven manner. By selecting valid input space features, the input dimensionality can be reduced while minimizing the loss of information about the problem to be solved.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

A dynamic ultra-high-dimensional feature selection method performed by a computer device, comprising:
The step of selecting and learning causal factors that are helpful in predicting the predicted value in time series prediction
including,
The causative factor is
Including at least one or more of metadata directly linked to the predicted value, time-series input data, and input covariates
Characterized by, feature selection method.

According to claim 1,
After learning by selecting the causal factor, outputting a final predicted value through a time series prediction model
Further comprising a feature selection method.

According to claim 1,
The step of selecting and learning the causative factor,
Through the algorithm, a combination of a plurality of input spatial features and a metric of how effective the combination of the input spatial features are in predicting a predicted value are learned to predict each predicted value in a data-driven manner. selecting a valid input space feature;
Including, feature selection method.

According to claim 1,
The step of selecting and learning the causative factor,
constructing an agent that proposes a combination of input space features;
learning, by the agent, the combination of input spatial features and a metric about how effective the combination of input spatial features is in predicting a prediction value; and
As learning progresses, the agent selecting the input space feature that is helpful in predicting a given instance prediction value based on the metadata.
Including, feature selection method.

According to claim 4,
The step of selecting and learning the causative factor,
Configuring a decoder when the number of elements in the input space feature is not limited
Including, feature selection method.

According to claim 4,
The step of selecting and learning the causative factor,
When limiting the number of elements in the input space feature, outputting as many feature instances as the corresponding number using a module capable of making arbitrary feature instance proposals.
Including, feature selection method.

According to claim 1,
The step of selecting and learning the causative factor,
When based on reinforcement learning, fixing the instance to be predicted and optimizing the loss
Characterized by, feature selection method.

According to claim 2,
The step of outputting the final predicted value through the time series prediction model,
Applying the loss of the time series prediction model based on the value derived during learning by selecting the causal factor, and end-to-end learning
Characterized by, feature selection method.

According to claim 2,
The step of outputting the final predicted value through the time series prediction model,
If the number of elements in the input space feature is not limited, additionally configuring an encoder.
Including, feature selection method.

According to claim 2,
The step of outputting the final predicted value through the time series prediction model,
Since the shape of the input manifold is inconsistent when the feature is changed for every prediction instance, conversion is performed to ensure consistency of the manifold between instances after feature selection. step
Including, feature selection method.

In the dynamic ultra-high-dimensional feature selection device,
A feature selection unit that selects and learns causal factors that are helpful in predicting predicted values in time series prediction.
including,
The causative factor is
Including at least one or more of metadata directly linked to the predicted value, time-series input data, and input covariates
Characterized in that, feature selection device.

According to claim 11,
A time series prediction modeling unit that selects and learns the causal factor and outputs a final predicted value through a time series prediction model.
Further comprising a feature selection device.

According to claim 11,
The feature selection unit,
Through the algorithm, a combination of a plurality of input spatial features and a metric of how effective the combination of the input spatial features are in predicting a predicted value are learned to predict each predicted value in a data-driven manner. selecting a valid said input space feature;
Characterized in that, feature selection device.

According to claim 11,
The feature selection unit,
An agent that proposes a combination of input spatial features is configured, the agent learns the combination of the input spatial features and a metric about how effective the combination of the input spatial features is in predicting a predicted value, and the learning proceeds Selecting the input space feature by the agent to help predict a given instance prediction value based on the metadata, according to
Characterized in that, feature selection device.

According to claim 14,
The feature selection unit,
Constructing a decoder when the number of elements in the input space feature is not limited
Characterized in that, feature selection device.