KR20230088967A

KR20230088967A - Anomaly detiection and repair based electrical load forecasting device and method

Info

Publication number: KR20230088967A
Application number: KR1020210177250A
Authority: KR
Inventors: 황인준; 박성우; 정승민; 정승원
Original assignee: 고려대학교 산학협력단
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2023-06-20
Anticipated expiration: 2041-12-13
Also published as: KR102593144B1

Abstract

본 발명은 전략 수요량을 예측할 수 있는 장치 및 방법에 관한 것으로, 좀 더 자세하게는 기계학습을 통하여 미래의 전략 수요를 정확하게 예측할 수 있는 장치 및 방법에 관한 것이다. 본 출원의 실시 예에 따른 전력 수요량 예측 장치는 기상 데이터, 전력 데이터 및 시간 데이터를 수집하는 데이터 수집부; 상기 기상 데이터, 상기 전력 데이터 및 상기 시간 데이터에 대한 전처리 동작을 수행하는 데이터 전처리부; 변분 오토인코더를 통하여 구현되며, 상기 전처리된 전력 데이터를 이상 데이터와 정상 데이터로 분류하는 이상치 탐지부; 랜덤 포레스트 모델을 통하여 구현되며, 상기 이상 데이터를 복원하여 복원 데이터를 생성하는 이상치 복원부; 및 슬라이딩 윈도우 기반의 LightGBM 모델을 통하여 구현되며, 상기 정상 데이터와 상기 복원 데이터에 기초하여 예측 모델을 학습하고 전력 수요량을 예측하는 전력 수요 예측부를 포함한다. 본 출원의 실시 예에 따른 전력 수요량 예측 장치는 이상치를 탐지하고 이를 복원하여 학습 데이터로 사용한다. 데이터 수가 충분하지 않은 상황에서 이상치를 복원하여 학습 데이터로 사용함으로써, 본 출원의 실시 예에 따른 전략 수요량 예측 장치는 오버 피팅 없이 향상된 예층 성능을 제공할 수 있다. The present invention relates to an apparatus and method for predicting strategic demand, and more particularly, to an apparatus and method for accurately predicting future strategic demand through machine learning. An apparatus for predicting power demand according to an embodiment of the present application includes a data collection unit that collects weather data, power data, and time data; a data preprocessing unit performing a preprocessing operation on the weather data, the power data, and the time data; an anomaly detection unit implemented through a variational autoencoder and classifying the preprocessed power data into abnormal data and normal data; an outlier restoration unit implemented using a random forest model and restoring the abnormal data to generate restored data; and a power demand prediction unit implemented using a sliding window-based LightGBM model, learning a prediction model based on the normal data and the restored data, and estimating power demand. An apparatus for predicting power demand according to an embodiment of the present application detects an outlier and restores it to use it as learning data. By restoring outliers and using them as training data in a situation where the number of data is not sufficient, the apparatus for predicting strategic demand according to an embodiment of the present application can provide improved layer performance without overfitting.

Description

Apparatus and method for predicting power demand based on anomaly detection and restoration

본 발명은 전략 수요량을 예측할 수 있는 장치 및 방법에 관한 것으로, 좀 더 자세하게는 기계학습을 통하여 미래의 전략 수요를 정확하게 예측할 수 있는 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for predicting strategic demand, and more particularly, to an apparatus and method for accurately predicting future strategic demand through machine learning.

스마트 그리드는 전 세계적으로 발생하는 환경문제 및 자원고갈 문제에 대한 실현 가능한 솔루션으로 많은 주목을 받고 있으며, 스마트 그리드를 구성하는 에너지 저장 시스템, 에너지 관리 시스템, 신재생 에너지 시스템 등 다양한 시스템들을 효율적으로 사용하기 위해서는 정확도 높은 전력 수요 예측이 필요하다.Smart Grid is receiving a lot of attention as a feasible solution to environmental and resource depletion problems occurring all over the world, and it efficiently uses various systems such as energy storage system, energy management system, and renewable energy system that make up the smart grid. To do this, it is necessary to forecast electricity demand with high accuracy.

최근 컴퓨터 기술이 발전함에 따라 기계학습 및 딥러닝 기반의 예측 모델들에 대한 연구가 활발하게 진행되고 있으며 좋은 성능을 보여주고 있다. 이러한 기계학습 및 딥러닝 기반의 예측 모델들은 데이터의 양과 질의 영향을 많이 받는다.As computer technology develops recently, research on machine learning and deep learning-based predictive models is actively progressing and showing good performance. These machine learning and deep learning-based prediction models are greatly affected by the quantity and quality of data.

예측 모델의 학습에 사용되는 데이터에 이상치 혹은 결측치가 많이 존재할 경우 모델의 학습에 방해가 되어 예측 정확도가 낮아질 수 있다. 학습하기에 충분한 양의 데이터가 수집된 경우에는 이상치와 결측치를 제거한 후에 모델을 학습하여도 문제가 되지 않지만, 데이터 수가 충분하지 않을 때 이상치와 결측치를 제거하게 되면 오버 피팅으로 인하여 모델의 학습 자체가 어려워질 수 있다.If there are many outliers or missing values in the data used for learning the predictive model, it may interfere with model learning and lower the prediction accuracy. If a sufficient amount of data is collected for learning, it is not a problem to train the model after removing outliers and missing values. It can get difficult.

본 발명은 데이터 수가 충분하지 않은 상황에서 이상치 탐지 및 복원을 통하여 오버 피팅 없이 예측 모델의 성능을 향상시킬 수 있는 전략 수요량 예측 장치를 제공하는데 있다. An object of the present invention is to provide a strategic demand forecasting device capable of improving the performance of a predictive model without overfitting through outlier detection and restoration in a situation where the number of data is insufficient.

본 출원의 실시 예에 따른 전력 수요량 예측 장치는 기상 데이터, 전력 데이터 및 시간 데이터를 수집하는 데이터 수집부; 상기 기상 데이터, 상기 전력 데이터 및 상기 시간 데이터에 대한 전처리 동작을 수행하는 데이터 전처리부; 변분 오토인코더를 통하여 구현되며, 상기 전처리된 전력 데이터를 이상 데이터와 정상 데이터로 분류하는 이상치 탐지부; 랜덤 포레스트 모델을 통하여 구현되며, 상기 이상 데이터를 복원하여 복원 데이터를 생성하는 이상치 복원부; 및 슬라이딩 윈도우 기반의 LightGBM 모델을 통하여 구현되며, 상기 정상 데이터와 상기 복원 데이터에 기초하여 예측 모델을 학습하고 전력 수요량을 예측하는 전력 수요 예측부를 포함한다.An apparatus for predicting power demand according to an embodiment of the present application includes a data collection unit that collects weather data, power data, and time data; a data preprocessing unit performing a preprocessing operation on the weather data, the power data, and the time data; an anomaly detection unit implemented through a variational autoencoder and classifying the preprocessed power data into abnormal data and normal data; an outlier restoration unit implemented using a random forest model and restoring the abnormal data to generate restored data; and a power demand prediction unit implemented using a sliding window-based LightGBM model, learning a prediction model based on the normal data and the restored data, and estimating power demand.

실시 예에 있어서, 상기 데이터 전처리부는 상기 기상 데이터에 대한 정규화 동작을 수행하는 기상 데이터 전처리부; 상기 전력 데이터에 대한 정규화 동작을 수행하는 전력 데이터 전처리부; 및 상기 시간 데이터를 2차원의 시간 데이터로 변환하는 시간 데이터 전처리부를 포함한다.In an embodiment, the data pre-processing unit may include a weather data pre-processing unit performing a normalization operation on the weather data; a power data pre-processing unit performing a normalization operation on the power data; and a time data pre-processing unit that converts the time data into two-dimensional time data.

실시 예에 있어서, 상기 기상 데이터 전처리부는 상기 기상 데이터 중 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성하고, 생성된 체감 온도 데이터와 불쾌지수 데이터에 대한 정규화 동작을 수행한다.In an embodiment, the meteorological data pre-processing unit generates sensory temperature data and discomfort index data based on temperature, humidity, and wind speed data among the meteorological data, and performs a normalization operation on the generated sensory temperature data and discomfort index data. .

실시 예에 있어서, 상기 시간 데이터 전처리부는 주기 함수를 이용하여 상기 시간 데이터를 서로 다른 두 개의 2차원 시간 데이터로 변환한다.In an embodiment, the time data pre-processing unit converts the time data into two different two-dimensional time data using a periodic function.

실시 예에 있어서, 상기 데이터 수집부가 수집하는 기상 데이터는 기상 예보 데이터 및 측정 기상 데이터 중 어느 하나이다.In an embodiment, the weather data collected by the data collector is any one of weather forecast data and measured weather data.

실시 예에 있어서, 상기 전력 수요 예측부는 7의 윈도우 사이즈를 갖는다.In an embodiment, the power demand predictor has a window size of 7.

본 출원의 실시 예에 따른 전력 수요량 예측 방법은 데이터 수집부에서, 기상 데이터, 시간 데이터 및 전력 데이터를 수집하는 단계; 데이터 전처리부에서, 상기 기상 데이터, 상기 시간 데이터 및 상기 전력 데이터에 대한 전처리 동작을 수행하는 단계; 이상치 탐지부에서, 상기 전처리된 전력 데이터 중 이상치를 탐지하는 단계; 이상치 복원부에서, 상기 이상치를 복원하여 복원 데이터를 생성하는 단계; 및 전력 수요 예측부에서, 상기 복원 데이터를 학습 데이터로 하여 예측 모델을 학습하고, 상기 학습된 예측 모델에 기초하여 전력 수요량을 예측하는 단계를 포함한다.A method for predicting power demand according to an embodiment of the present application includes collecting weather data, time data, and power data in a data collection unit; performing a pre-processing operation on the weather data, the time data, and the power data in a data pre-processing unit; detecting an outlier among the preprocessed power data in an outlier detection unit; restoring the outliers in an outlier restoring unit to generate restoration data; and learning a prediction model using the restored data as learning data in a power demand prediction unit, and estimating power demand based on the learned prediction model.

실시 예에 있어서, 상기 전처리 동작을 수행하는 단계는 상기 기상 데이터 및 상기 전력 데이터를 정규화하는 단계; 및 상기 시간 데이터를 주기함수를 이용하여 서로 다른 두 개의 시간 데이터로 변환하는 단계를 포함한다.In an embodiment, the performing of the preprocessing operation may include normalizing the weather data and the power data; and converting the time data into two different time data using a periodic function.

실시 예에 있어서, 상기 기상 데이터를 정규화하는 단계는 상기 기상 데이터 중 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성하는 단계; 및 상기 생성된 체감 온도 데이터와 상기 생성된 불쾌지수 데이터에 대한 정규화 동작을 수행하는 단계를 포함한다.In an embodiment, the normalizing of the meteorological data may include generating sensory temperature data and discomfort index data based on temperature, humidity, and wind speed data among the meteorological data; and performing a normalization operation on the generated sensory temperature data and the generated discomfort index data.

실시 예에 있어서, 상기 기상 데이터는 기상 예보 데이터 및 측정 기상 데이터 중 어느 하나이다.In an embodiment, the weather data is any one of weather forecast data and measured weather data.

실시 예에 있어서, 상기 이상치 탐지부는 변분 오토인코더를 통하여 구현된다.In an embodiment, the outlier detection unit is implemented through a variational autoencoder.

실시 예에 있어서, 상기 이상치 복원부는 랜덤 포레스트 모델을 통하여 구현된다.In an embodiment, the outlier restoration unit is implemented through a random forest model.

실시 예에 있어서, 상기 전력 수요량 예측부는 슬라이딩 윈도우 기반의 LightGBM을 통하여 구현된다.In an embodiment, the power demand estimation unit is implemented through LightGBM based on a sliding window.

실시 예에 있어서, 상기 전력 수요량 예측부는 7의 윈도우 사이즈를 갖는다.In an embodiment, the power demand estimation unit has a window size of 7.

본 출원의 실시 예에 따른 전력 수요량 예측 장치는 이상치를 탐지하고 이를 복원하여 학습 데이터로 사용한다. 데이터 수가 충분하지 않은 상황에서 이상치를 복원하여 학습 데이터로 사용함으로써, 본 출원의 실시 예에 따른 전략 수요량 예측 장치는 오버 피팅 없이 향상된 예층 성능을 제공할 수 있다. An apparatus for predicting power demand according to an embodiment of the present application detects an outlier and restores it to use it as learning data. By restoring outliers and using them as training data in a situation where the number of data is not sufficient, the apparatus for predicting strategic demand according to an embodiment of the present application can provide improved layer performance without overfitting.

도 1은 본 출원의 실시 예에 따른 전략 수요량 예측 장치(10)를 보여주는 블록도이다.
도 2는 도 1의 데이터 전처리부(200)를 좀 더 자세히 보여주는 블록도이다.
도 3a는 범주형 시간 데이터(Data_T)의 일 예를 보여주는 도면이다.
도 3b는 2차원 시간 데이터(Data_T1, Data_T2)의 일 예를 보여주는 도면이다.
도 4는 도 1의 이상치 탐지부(300)를 좀 더 자세히 보여주는 도면이다.
도 5는 이상치 탐지부(300)의 이상치 탐지 실험 결과를 보여주는 그래프이다.
도 6은 도 1의 이상치 복원부(400)를 좀 더 자세히 보여주는 도면이다.
도 7은 이상치 복원부(400)의 복원 실험 결과를 보여주는 그래프이다.
도 8은 도 1의 전력 수요 예측부(500)를 좀 더 자세히 보여주는 도면이다.
도 9는 전력 수요 예측부(500)의 윈도우 사이즈 결정을 위한 실험 결과를 보여주는 그래프이다.
도 10은 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)의 예측 결과를 다른 모델과 비교하는 그래프이다.
도 11은 도 1의 전력 수요량 예측 장치(10)의 동작을 보여주는 순서도이다.1 is a block diagram showing an apparatus 10 for predicting strategic demand quantity according to an embodiment of the present application.
FIG. 2 is a block diagram showing the data pre-processing unit 200 of FIG. 1 in more detail.
3A is a diagram showing an example of categorical time data Data_T.
3B is a diagram showing an example of two-dimensional time data Data_T1 and Data_T2.
FIG. 4 is a diagram showing the anomaly detection unit 300 of FIG. 1 in more detail.
5 is a graph showing the results of the outlier detection experiment performed by the outlier detection unit 300 .
FIG. 6 is a diagram showing the outlier restoration unit 400 of FIG. 1 in more detail.
7 is a graph showing results of restoration experiments performed by the outlier restoration unit 400 .
FIG. 8 is a diagram showing the power demand predictor 500 of FIG. 1 in more detail.
9 is a graph showing experimental results for determining the window size of the power demand predictor 500 .
10 is a graph comparing prediction results of the power demand prediction apparatus 10 according to an embodiment of the present application with other models.
FIG. 11 is a flowchart showing the operation of the apparatus 10 for predicting power demand amount of FIG. 1 .

이하에서는, 본 출원의 기술 분야에서 통상의 지식을 가진 자가 본 출원의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 출원의 실시 예들이 첨부된 도면을 참조하여 좀 더 자세히 설명될 것이다.Hereinafter, embodiments of the present application will be described in more detail with reference to the accompanying drawings in order to explain in detail enough that a person skilled in the art of the present application can easily practice the technical idea of the present application. will be.

도 1은 본 출원의 실시 예에 따른 전략 수요량 예측 장치(10)를 보여주는 블록도이다.1 is a block diagram showing an apparatus 10 for predicting strategic demand quantity according to an embodiment of the present application.

도 1을 참조하면, 전략 수요량 예측 장치(10)는 데이터 수집부(100), 데이터 전처리부(200), 이상치 탐지부(300), 이상치 복원부(400) 및 전력수요 예측부(500)를 포함한다. Referring to FIG. 1, the strategic demand prediction device 10 includes a data collection unit 100, a data pre-processing unit 200, an outlier detection unit 300, an outlier restoration unit 400, and a power demand prediction unit 500. include

데이터 수집부(100)는 전력 수요량 예측에 필요한 각종 데이터를 수집한다. 예를 들어, 데이터 수집부(100)는 기상 데이터, 전력 데이터 및 시간 데이터를 수집할 수 있다. The data collection unit 100 collects various data necessary for predicting power demand. For example, the data collection unit 100 may collect weather data, power data, and time data.

예를 들어, 데이터 수집부(100)는 외부로부터 각종 기상에 대한 데이터를 수집할 수 있다. 데이터 수집부(100)가 수집하는 기상 데이터는, 예를 들어, 일 최고 기온, 일 최저 기온, 기온, 습도, 풍속, 전운량, 강수량에 대한 데이터일 수 있다. 데이터 수집부(100)는, 예를 들어, 기상청의 기상자료개방포털 등을 통하여 기상 데이터를 수집할 수 있다. For example, the data collection unit 100 may collect various weather data from the outside. The meteorological data collected by the data collection unit 100 may be, for example, data on daily maximum temperature, daily minimum temperature, air temperature, humidity, wind speed, total cloudiness, and precipitation. The data collection unit 100 may collect weather data through, for example, a weather data open portal of the Korea Meteorological Administration.

예를 들어, 데이터 수집부(100)는 적어도 하나의 클러스터로부터 소모한 전력량에 대한 전력 데이터를 수집할 수 있다. 이 경우, 클러스터는 데이터의 편향을 방지하기 위하여 서로 다른 용도의 건물일 수 있다. 예를 들어, 데이터 수집부(100)는 교육용 건물들로 이루어진 클러스터 A, 기숙사들로 이루어진 클러스터 B, 공과대학 연구실로 이루어진 클러스터 C 및 이과대학 연구실로 이루어진 클러스터 D로부터 전력 데이터를 수집할 수 있다. 다만, 이는 예시적인 것이며, 클러스터의 개수 및 종류는 다양하게 설정될 수 있다.For example, the data collection unit 100 may collect power data on the amount of power consumed by at least one cluster. In this case, clusters may be buildings for different purposes in order to prevent data bias. For example, the data collection unit 100 may collect power data from cluster A consisting of educational buildings, cluster B consisting of dormitories, cluster C consisting of laboratories at the college of engineering, and cluster D consisting of laboratories at the college of science. However, this is an example, and the number and type of clusters may be set in various ways.

예를 들어, 데이터 수집부(100)는 각종 기상 및 전력 데이터를 수집할 때에, 이에 대응하는 시간 데이터를 함께 수집할 수 있다. 예를 들어, 데이터 수집부(100)는 기상 데이터를 수집할 때에 월(month), 일(day), 시(hour), 분(minute)에 대한 시간 데이터를 함께 수집할 수 있다. 또한, 데이터 수집부(100)는 전력 데이터를 수집할 때에 월(month), 일(day), 시(hour), 분(minute)에 대한 시간 데이터를 함께 수집할 수 있다.For example, when collecting various weather and power data, the data collection unit 100 may also collect time data corresponding thereto. For example, when collecting weather data, the data collection unit 100 may also collect time data for months, days, hours, and minutes. In addition, when collecting power data, the data collection unit 100 may also collect time data for months, days, hours, and minutes.

본 출원의 일 실시 예에 있어서, 데이터 수집부(100)는 실제로 측정된 기상 데이터 뿐만 아니라, 과거 시점으로부터 소정 기간 전에 예보된 기상 예보 데이터를 함께 수집할 수 있다. 예를 들어, 2018년 5월 5일에 소모된 전력 데이터 및 기상 데이터를 수집할 때에, 해당 2018년 5월 5일에 대응하는 기상 데이터는 하루 전인 2018년 5월 4일에 예보된 일 최소 기온, 일 최고 기온, 일 평균기온, 기온, 습도, 풍속, 전운량, 강수량 등에 대한 기상 예보 데이터일 수 있다. 이와 같이, 실제 측정된 기상 데이터 뿐만 아니라 기상 예보 데이터를 함께 수집하고, 이를 학습 모델 구현을 위한 학습 데이터로 제공함으로써, 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)는 기상 예보 데이터와 실제 기상 데이터 사이의 오차까지 함께 고려하여 학습 동작을 수행할 수 있다.In one embodiment of the present application, the data collection unit 100 may collect not only actually measured weather data, but also weather forecast data predicted a predetermined period from a past point in time. For example, when collecting power consumption data and meteorological data on May 5, 2018, the meteorological data corresponding to May 5, 2018 is the minimum temperature forecast for the day preceding May 4, 2018. , daily maximum temperature, daily average temperature, temperature, humidity, wind speed, total cloudiness, precipitation, etc. may be weather forecast data. In this way, by collecting weather forecast data as well as actually measured weather data and providing them as learning data for implementing a learning model, the power demand forecasting device 10 according to an embodiment of the present application provides weather forecast data and actual weather forecast data. The learning operation may be performed by considering an error between weather data together.

다만, 이는 예시적인 것이며, 데이터 수집부(100)는 실제로 측정된 기상 데이터만을 수집할 수도 있으며, 이 경우에 학습 동작은 실제 측정된 기상 데이터만을 이용하여 수행될 수 있다. 다른 예로, 데이터 수집부(100)는 기상 예보 데이터만을 수집할 수 있으며, 이 경우에 학습 동작은 기상 예보 데이터만을 이용하여 수행될 수도 있다.However, this is exemplary, and the data collection unit 100 may collect only actually measured weather data, and in this case, a learning operation may be performed using only actually measured weather data. As another example, the data collection unit 100 may collect only weather forecast data, and in this case, a learning operation may be performed using only weather forecast data.

데이터 전처리부(200)는 데이터 수집부(100)로부터 기상 데이터, 시간 데이터 및 전력 데이터를 수신한다. 데이터 전처리부(200)는 이상 탐지부(300)에서 활용될 수 있도록 수신한 기상 데이터, 전력 데이터 및 시간 데이터에 대한 전처리 동작을 수행한다.The data preprocessor 200 receives weather data, time data, and power data from the data collector 100 . The data preprocessing unit 200 performs a preprocessing operation on received weather data, power data, and time data so that the anomaly detection unit 300 can utilize them.

예를 들어, 데이터 전처리부(200)는 데이터 수집부(100)로부터 기상 데이터 및 전력 데이터를 수신하고, 수신한 기상 데이터 및 전력 데이터에 대한 정규화 동작을 수행할 수 있다. 또한, 데이터 전처리부(200)는 데이터 수집부(100)로부터 1차원의 시간 데이터를 수신하고, 수신한 1차원의 시간 데이터를 2차원의 시간 데이터로 변환할 수 있다. 데이터 전처리부(200)의 구성 및 동작은 이하의 도 2 및 도 3에서 좀 더 자세히 설명될 것이다.For example, the data pre-processor 200 may receive weather data and power data from the data collection unit 100 and may perform a normalization operation on the received weather data and power data. Also, the data pre-processing unit 200 may receive one-dimensional time data from the data collecting unit 100 and convert the received one-dimensional time data into two-dimensional time data. The configuration and operation of the data pre-processing unit 200 will be described in detail with reference to FIGS. 2 and 3 below.

이상치 탐지부(300)는 데이터 전처리부(200)로부터 전처리된 전력 데이터를 수신한다. 이상치 탐지부(300)는 전처리된 전력 데이터에서 이상치를 탐지할 수 있다. The anomaly detection unit 300 receives preprocessed power data from the data preprocessing unit 200 . The outlier detection unit 300 may detect an outlier in the preprocessed power data.

본 출원의 일 실시 예에 있어서, 이상치 탐지부(300)는 변분 오토인코더(Variational Autoencoder, VAE)를 통하여 이상치를 탐지할 수 있다. 변분 오토인코더는 입력 값의 분포를 학습하여 출력 값을 생성하기 때문에, 이상치 탐지부(300)는 일반적인 전력 수요 분포에서 벗어난 이상치를 좀 더 잘 탐지할 수 있다. 이상치 탐지부(300)는 이하의 도 4 및 도 5에서 좀 더 자세히 설명될 것이다. In one embodiment of the present application, the outlier detection unit 300 may detect outliers through a variational autoencoder (VAE). Since the variational autoencoder generates output values by learning the distribution of input values, the outlier detection unit 300 can better detect outliers that deviate from the general power demand distribution. The outlier detection unit 300 will be described in more detail with reference to FIGS. 4 and 5 below.

이상치 복원부(400)는 이상치 탐지부(300)로부터 이상 데이터를 수신한다. 이상치 복원부(400)는 이상 데이터에 대응하는 시점의 다른 입력 변수를 기반으로 하여, 이상 데이터를 복원하여 복원 데이터를 생성한다.The outlier restoration unit 400 receives abnormal data from the outlier detection unit 300 . The outlier restoration unit 400 restores the abnormal data based on other input variables at the time corresponding to the abnormal data to generate restored data.

본 출원의 일 실시 예에 있어서, 이상치 복원부(400)는 랜덤 포레스트(RF) 모델을 통하여 이상치 데이터로부터 복원 데이터를 복원할 수 있다. 이상치 복원부(400)는 이하의 도 6 및 도 7에서 좀 더 자세히 설명될 것이다.In an embodiment of the present application, the outlier restoration unit 400 may restore restoration data from outlier data through a random forest (RF) model. The outlier restoration unit 400 will be described in more detail with reference to FIGS. 6 and 7 below.

전력수요 예측부(500)는 이상치 복원부(500)로부터 복원 데이터를 포함하는 입력 데이터를 수신한다. 여기서, 입력 데이터는 데이터 전처리부(200)에서 전처리된 데이터 및 이상치 복원부(400)에서 복원된 데이터를 포함한다. 전력수요 예측부(500)는 입력 데이터를 사용하여 전력수요량 예측을 위한 모델을 학습시킬 수 있다. The power demand predictor 500 receives input data including restored data from the outlier restorer 500 . Here, the input data includes data preprocessed by the data preprocessor 200 and data restored by the outlier restoration unit 400 . The power demand predictor 500 may learn a model for predicting the amount of power demand using the input data.

본 출원의 일 실시 예에 있어서, 전력수요 예측부(500)는 슬라이딩 윈도우(sliding window) 기반의 Light GBM 모델을 통하여 구현될 수 있다. 전력수요 예측부(500)는 이하의 도 8 및 도 9에서 좀 더 자세히 설명될 것이다.In an embodiment of the present application, the power demand predictor 500 may be implemented through a sliding window-based Light GBM model. The power demand predictor 500 will be described in more detail in FIGS. 8 and 9 below.

상술한 바와 같이, 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)는 변분 오토인코더를 통하여 이상치를 탐지하고, 랜덤 포레스트 모델을 통하여 이상치를 복원한다. 따라서, 데이터 수가 충분하지 않은 상황에서도 안정적인 학습 데이터 제공이 가능하며, 이에 따라 전력 수요량 예측 장치(10)는 오버 피팅 없이 향상된 예측 성능을 제공할 수 있다. 아울러, 전력 수요량 예측 장치(10)는 슬라이딩 윈도우 기반의 Light GBM 모델을 통하여 예측 모델을 구현하며, 이에 따라 예측 시점과 가까운 최신 데이터 패턴을 적절히 반영할 수 있어서 예측 성능을 더욱 향상시킬 수 있다.As described above, the power demand prediction apparatus 10 according to an embodiment of the present application detects an outlier through a variational autoencoder and restores the outlier through a random forest model. Therefore, it is possible to provide stable training data even in a situation where the number of data is not sufficient, and accordingly, the power demand estimation apparatus 10 can provide improved prediction performance without overfitting. In addition, the power demand prediction device 10 implements a prediction model through a sliding window-based light GBM model, and accordingly, the latest data pattern close to the prediction time can be appropriately reflected, thereby further improving prediction performance.

도 2는 도 1의 데이터 전처리부(200)를 좀 더 자세히 보여주는 블록도이다.FIG. 2 is a block diagram showing the data pre-processing unit 200 of FIG. 1 in more detail.

도 2를 참조하면, 데이터 전처리부(200)는 기상 데이터 전처리부(210), 전력 데이터 전처리부(220) 및 시간 데이터 전처리부(230)를 포함한다.Referring to FIG. 2 , the data preprocessor 200 includes a weather data preprocessor 210 , a power data preprocessor 220 and a time data preprocessor 230 .

기상 데이터 전처리부(210)는 데이터 수집부(100)로부터 기상 데이터(Data_W)를 수신한다. 기상 데이터 전처리부(210)는 수신된 기상 데이터(Data_W)에 대한 전처리 동작을 수행하여 제1 기상 데이터(Data_W1)를 생성한다.The weather data pre-processing unit 210 receives weather data Data_W from the data collection unit 100 . The weather data preprocessing unit 210 generates first weather data Data_W1 by performing a preprocessing operation on the received weather data Data_W.

예를 들어, 기상 데이터 전처리부(210)는 데이터 수집부(100)로부터 일 최고 기온, 일 최저 기온, 기온, 습도, 풍속, 전운량, 강수량 등의 데이터를 수신할 수 있다. 기상 데이터 전처리부(210)는 수신된 기상 데이터들에 대한 정규화 동작을 수행하는 것을 통해 제1 기상 데이터(Data_W1)를 생성할 수 있다. For example, the meteorological data pre-processing unit 210 may receive data such as daily maximum temperature, daily minimum temperature, air temperature, humidity, wind speed, total cloudiness, and precipitation from the data collection unit 100 . The weather data preprocessing unit 210 may generate first weather data Data_W1 by performing a normalization operation on the received weather data.

또한, 기상 데이터 전처리부(210)는 수신된 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성할 수 있다. 이후, 기상 데이터 전처리부(210)는 체감 온도 데이터와 불쾌지수 데이터를 정규화하여 제2 기상 데이터(Data_W2)를 생성할 수 있다.Also, the meteorological data pre-processing unit 210 may generate sensory temperature data and discomfort index data based on the received temperature, humidity, and wind speed data. Thereafter, the weather data pre-processing unit 210 may normalize the sensory temperature data and the discomfort index data to generate second weather data Data_W2.

전력 데이터 전처리부(220)는 데이터 수집부(100)로부터 적어도 하나의 클러스터에서 소모된 전력량에 대한 전력 데이터를 수신할 수 있다. 전력 데이터 전처리부(220)는 수신된 전력 데이터에 대한 전처리 동작을 수행하여 전처리된 전력 데이터(Data_Pp)를 생성할 수 있다.The power data preprocessor 220 may receive power data on the amount of power consumed in at least one cluster from the data collector 100 . The power data preprocessor 220 may generate preprocessed power data Data_Pp by performing a preprocessing operation on the received power data.

시간 데이터 전처리부(230)는 데이터 수집부(100)로부터 시간 데이터(Data_T)를 수신한다. 여기서, 시간 데이터는 범주형 데이터를 반영하는데 좋은 1차원 데이터일 수 있다. 시간 데이터 전처리부(230)는 1차원의 시간 데이터(Data_T)를 그대로 출력하거나, 1차원의 시간 데이터(Data_T)를 주기성 정보를 반영할 수 있는 2차원의 시간 데이터((Data_T1, Data_T2)로 변환할 수 있다.The time data pre-processing unit 230 receives the time data Data_T from the data collecting unit 100 . Here, time data may be one-dimensional data that is good for reflecting categorical data. The time data preprocessor 230 outputs one-dimensional time data Data_T as it is or converts one-dimensional time data Data_T into two-dimensional time data (Data_T1, Data_T2) that can reflect periodicity information. can do.

좀 더 자세히 설명하면, 1차원 시간 데이터는 1월, 2월과 같은 월(month)을 나타내는 범주형 정보, 1일, 2일과 같이 일(day)을 나타내는 범주형 정보 및 1시, 2시와 같이 시(hour)를 나타내는 범주형 정보는 잘 반영한다. 그러나, 1차원 시간 데이터는 주기성 정보는 잘 반영하지 못한다는 문제가 있다. 예를 들어, 23시와 0시는 연속적인 시간임에도 불구하고, 1차원 데이터 상으로는 23의 차이가 발생하게 된다.In more detail, one-dimensional time data includes categorical information representing months such as January and February, categorical information representing days such as 1st and 2nd, and 1:00, 2:00 and Similarly, categorical information representing hours is well reflected. However, there is a problem that one-dimensional time data does not reflect periodicity information well. For example, although 23 o'clock and 0 o'clock are consecutive times, a difference of 23 occurs on one-dimensional data.

따라서, 시간의 주기성 정보가 잘 반영될 수 있도록, 시간 데이터 전처리부(230)는 1차원의 시간 데이터(Data_T)를 2차원의 시간 데이터((Data_T1, Data_T2)로 변환할 수 있다. 이때, 시간 데이터 전처리부(230)는 사인(sin) 함수와 코사인 함수(cos)와 같은 주기 함수를 통하여 변환 동작을 수행할 수 있다. 예를 들어, 시간 데이터 전처리부(230)는 다음의 수식을 통하여 1차원의 시간 데이터(Data_T)를 2차원의 시간 데이터(Data_T1, Data_T2)로 변환할 수 있다.Accordingly, the time data pre-processing unit 230 may convert the 1-dimensional time data Data_T into 2-dimensional time data (Data_T1, Data_T2) so that the periodicity information of time may be well reflected. At this time, time The data pre-processor 230 may perform a conversion operation through a periodic function such as a sine function and a cosine function (cos) For example, the time data pre-processor 230 calculates 1 through the following formula. One-dimensional time data (Data_T) can be converted into two-dimensional time data (Data_T1, Data_T2).

여기서, "cycle"은 시간 데이터의 주기를 나타낸다. 예를 들어, "time"이 월(month) 데이터인 경우에 "cycle"은 "12"일 수 있고, "time"이 일(day) 데이터인 경우에 "cycle"은 해달 월의 일수(Day of the Month)일 수 있으며, "time"이 시(hour) 데이터인 경우에 "cycle"은 "24"일 수 있다. 또한, 여기서, "

"와 "

"는 각각 도 2의 Data_T1과 Data_T2에 대응할 수 있다.Here, "cycle" represents a period of time data. For example, when "time" is month data, "cycle" can be "12", and when "time" is day data, "cycle" is the number of days of the month. the Month), and when "time" is hour data, "cycle" may be "24". Also, here, "

"and "

" may correspond to Data_T1 and Data_T2 of FIG. 2 , respectively.

한편, 시간 데이터 전처리부(230)가 2개의 삼각함수 값을 통해 2차원으로 표현하는 이유는, 예를 들어 주기가 12인 하나의 삼각함수 값을 통해 표현할 경우에는 두 개의 x값에 대해서 같은 y값이 결정되며, 이 경우에 y값만으로 시기를 특정하기 어렵기 때문이다. 따라서, 동일한 x값이라도 서로 다른 y값을 갖는 두 개의 삼각함수를 사용하여 이러한 문제를 해결할 수 있도록, 시간 데이터 전처리부(230)는 2개의 삼각함수를 이용하여 2차원 데이터(Data_T1, Data_T2)를 생성한다.On the other hand, the reason why the time data preprocessor 230 expresses in two dimensions through two trigonometric function values is that, for example, when expressing through one trigonometric function value with a period of 12, the same y for two x values value is determined, and in this case, it is difficult to specify the time only with the y value. Therefore, in order to solve this problem by using two trigonometric functions having different y values for the same x value, the time data preprocessor 230 generates the two-dimensional data Data_T1 and Data_T2 using two trigonometric functions. generate

상술한 바와 같이, 데이터 전처리부(200)는 기상 데이터(Data_W), 전력 데이터(Data_P) 및 시간 데이터(Data_T)에 대한 전처리 동작을 수행할 수 있다. 특히, 데이터 전처리부(200)는 특히 범주형 정보를 잘 반영하는 1차원의 시간 데이터(Data_T)와 주기성 정보를 잘 반영하는 2차원의 시간 데이터(Data_T1, Data_T2)를 함께 출력할 수 있다.As described above, the data preprocessor 200 may perform preprocessing operations on weather data Data_W, power data Data_P, and time data Data_T. In particular, the data preprocessing unit 200 may output both one-dimensional time data (Data_T) that reflects categorical information well and two-dimensional time data (Data_T1 and Data_T2) that well reflect periodicity information.

도 3은 도 2의 시간 데이터 전처리부(230)에 의하여 출력되는 시간 데이터의 일 예를 보여주는 도면이다. 구체적으로, 도 3a는 범주형 시간 데이터(Data_T)의 일 예를 보여주며, 도 3b는 2차원 시간 데이터(Data_T1, Data_T2)의 일 예를 보여준다.FIG. 3 is a diagram showing an example of time data output by the time data preprocessor 230 of FIG. 2 . Specifically, FIG. 3A shows an example of categorical temporal data Data_T, and FIG. 3B shows an example of two-dimensional temporal data Data_T1 and Data_T2.

도 3에 도시된 바와 같이, 시간 데이터 전처리부(230)는 시간 데이터(Data_T)에 대한 전처리 동작을 수행하여 2차원 시간 데이터(Data_T1, Data_T2)를 생성할 수 있다. As shown in FIG. 3 , the time data pre-processor 230 may generate 2-dimensional time data Data_T1 and Data_T2 by performing a pre-processing operation on the time data Data_T.

도 4는 도 1의 이상치 탐지부(300)를 좀 더 자세히 보여주는 도면이고, 도 5는 이상치 탐지부(300)의 이상치 탐지 실험 결과를 보여주는 그래프이다.FIG. 4 is a diagram showing the outlier detection unit 300 of FIG. 1 in more detail, and FIG. 5 is a graph showing an outlier detection test result of the outlier detection unit 300 .

도 4를 참조하면, 이상치 탐지부(300)는 데이터 전처리부(200)로부터 전처리된 전력 데이터(Data_Pp)를 수신한다. 이상치 탐지부(300)는 전처리된 전력 데이터(Data_Pp)에서 이상치를 탐지하여, 정상 데이터(Normal Data)와 이상 데이터(Abnormal Data)로 구분할 수 있다.Referring to FIG. 4 , the outlier detector 300 receives preprocessed power data Data_Pp from the data preprocessor 200 . The anomaly detection unit 300 may detect an outlier in the preprocessed power data Data_Pp and divide it into normal data and abnormal data.

본 출원의 실시 예에 있어서, 이상치 탐지부(300)는 변분 오토인코더를 통하여 구현될 수 있다. 즉, 이상치 탐지부(300)는 전처리된 전력 데이터를 이용하여 변분 인코더를 학습하고, 학습된 변분 오토인코더를 통해 재구성된 출력 값과 입력 값의 차이를 이용하여 이상치를 탐지할 수 있다. In an embodiment of the present application, the outlier detection unit 300 may be implemented through a variational autoencoder. That is, the outlier detection unit 300 may learn a variational encoder using the preprocessed power data and detect an outlier using a difference between an output value and an input value reconstructed through the learned variational autoencoder.

만약 이상치 탐지부가 일반 오토인코더(AE)를 통하여 구현한다면, 오토인코더 기반의 이상 탐지부는 모든 입력 변수를 재구성한 후에 각 입력 변수들의 재구성 오류를 모두 더한다. 이후, 오토인코더 기반의 이상 탐지부는 재구성 오류의 합이 일정 값 이상이 될 경우에 이상치라고 판단한다. 그러나, 이 경우, 실제 전력 소모량에 이상치가 발생한 경우가 아닐 때에도, 오토인코더 기반의 이상치 탐지부는 이상치라고 판단할 수 있는 위험이 있다. If the anomaly detection unit is implemented using a general autoencoder (AE), the autoencoder-based anomaly detection unit reconstructs all input variables and then adds reconstruction errors of each input variable. Thereafter, the anomaly detection unit based on the autoencoder determines that the sum of the reconstruction errors is an outlier when it exceeds a certain value. However, in this case, even when an outlier does not occur in actual power consumption, there is a risk that the autoencoder-based outlier detection unit may determine the outlier as an outlier.

예를 들어, 여름철에 장마로 인하여 갑작스럽게 어느 한 시점에 비가 많이 내렸다고 가정하자. 이 경우, 오토인코더에 기반의 이상 탐지부는 모든 입력 변수들의 재구성 오류를 고려하기 때문에, 전력 사용량을 비롯하여 재구성된 입력 변수들이 정상임에도 불구하고 재구성된 강수량 관련 변수에서 차이가 많이 발생하여 결과적으로 이상치라고 잘못 판단할 수 있다.For example, let's assume that a lot of rain suddenly fell at a certain point in the summer due to the rainy season. In this case, since the anomaly detection unit based on the autoencoder considers the reconstruction error of all input variables, even though the reconstructed input variables including power consumption are normal, there is a large difference in the reconstructed precipitation-related variables, resulting in outliers. may misjudge.

이러한 오류 위험성을 방지하기 위하여, 본 출원의 실시 예에 따른 이상치 탐지부(300)는 변분 오토인코더를 통하여 구현될 수 있다. 이 경우, 모든 입력 변수의 재구성 오류를 더하는 것을 통해 이상치를 판단하는 것이 아니라, 전처리된 전력 데이터(Data_Pp)의 재구성 오류만을 이용하여 이상치가 판단되기 때문에, 본 출원의 실시 예에 따른 이상치 탐지부(300)는 좀 더 향상된 이상치 탐지 성능을 가질 수 있다. In order to prevent such a risk of error, the outlier detection unit 300 according to an embodiment of the present application may be implemented through a variational autoencoder. In this case, since the outlier is determined using only the reconstruction error of the preprocessed power data Data_Pp, rather than determining the outlier by adding the reconstruction errors of all input variables, the anomaly detection unit according to the embodiment of the present application ( 300) may have more improved outlier detection performance.

도 5를 참조하면, 본 출원의 실시 예에 따른 변분 오토인코더 기반의 이상치 탐지부(300)가 다른 모델에 비하여 좀 더 향상된 성능을 갖는 것을 확인할 수 있다. 도 5에서, IQR, IForest, LOF는 각각 사분위수 범위(interquartile range, IQR)를 활용한 이상치 탐지 모델, Isolation Forest 이상치 탐지 모델 및 Local Outlier Factor 이상치 탐지 모델을 의미한다.Referring to FIG. 5 , it can be seen that the outlier detection unit 300 based on the variational autoencoder according to an embodiment of the present application has more improved performance than other models. In FIG. 5 , IQR, IForest, and LOF denote an outlier detection model using an interquartile range (IQR), an isolation forest outlier detection model, and a local outlier factor outlier detection model, respectively.

도 6은 도 1의 이상치 복원부(400)를 좀 더 자세히 보여주는 도면이고, 도 7은 이상치 복원부(400)의 복원 실험 결과를 보여주는 그래프이다.FIG. 6 is a diagram showing the outlier restoration unit 400 of FIG. 1 in more detail, and FIG. 7 is a graph showing restoration test results of the outlier restoration unit 400 .

도 6을 참조하면, 이상치 복원부(400)는 이상치 탐지부(300)로부터 정상 데이터(Normal Data)와 이상 데이터(Abnormal Data)를 수신한다. 이상치 복원부(400)는 이상 데이터가 탐지된 시점의 다른 입력 변수를 기반으로 하여 이상 데이터를 복원 데이터(Repair Data)로 복원할 수 있다.Referring to FIG. 6 , the outlier restoration unit 400 receives normal data and abnormal data from the outlier detection unit 300 . The outlier restoration unit 400 may restore the abnormal data into repair data based on other input variables at the time of detection of the abnormal data.

본 출원의 일 실시 예에 있어서, 이상치 복원부(400)는 랜덤 포레스트(RF) 모델을 사용하여 구현될 수 있다. 즉, 이상치 복원부(400)는 정상 데이터를 사용하여 랜덤 포레스트 모델을 학습하고, 학습된 랜덤 포레스트 모델에 이상치가 발생한 시점의 입력 변수들을 넣었을 때에 도출된 값으로 복원 데이터를 생성할 수 있다. 랜덤 포레스트 모델은 부분 데이터 셋을 사용하여 각각의 의사 결정 트리를 학습한다. 따라서, 이상치 복원부(400)는 오버 피팅이 발생하지 않을 뿐 아니라, 입력 변수가 많음에도 불구하고 향상된 성능을 가질 수 있다.In one embodiment of the present application, the outlier restoration unit 400 may be implemented using a random forest (RF) model. That is, the outlier restoring unit 400 may learn a random forest model using normal data and generate restored data with values derived when input variables at the time when an outlier occurs are inserted into the learned random forest model. A random forest model uses a partial data set to learn each decision tree. Therefore, the outlier restoration unit 400 can have improved performance despite the fact that over-fitting does not occur and there are many input variables.

도 7을 참조하면, 본 출원의 실시 예에 따른 랜덤 포레스트 기반의 이상치 복원부(400)가 다른 모델에 비하여 좀 더 향상된 성능을 갖는 것을 확인할 수 있다. 도 7에서, Zero, Linear, RF 는 각각 zero interpolation을 사용하여 이상치를 복원하였을 때, Linear interpolation을 사용하여 이상치를 복원하였을 때 및 Random Forest를 사용하여 이상치를 복원하였을 때를 나타낸다.Referring to FIG. 7 , it can be confirmed that the random forest-based outlier restoration unit 400 according to an embodiment of the present application has more improved performance than other models. In FIG. 7, Zero, Linear, and RF indicate when an outlier is restored using zero interpolation, when an outlier is restored using linear interpolation, and when an outlier is restored using a random forest, respectively.

도 8은 도 1의 전력 수요 예측부(500)를 좀 더 자세히 보여주는 도면이고, 도 9는 전력 수요 예측부(500)의 윈도우 사이즈 결정을 위한 실험 결과를 보여주는 그래프이다. 8 is a diagram showing the power demand estimating unit 500 of FIG. 1 in more detail, and FIG. 9 is a graph showing an experimental result for determining the window size of the power demand estimating unit 500 .

도 8을 참조하면, 전력 수요 예측부(500)는 이상치 복원부(400)부로부터 전력 소모량에 대한 정상 데이터 및 복원 데이터를 수신하고, 데이터 전처리부(200)로부터 제1 및 제2 기상 데이터(Data_W1, Data_W2), 범주형 시간 데이터(Data_T) 및 주기성 시간 데이터(Data_T1, Data_T2)를 수신할 수 있다. 전력 수요 예측부(500)는 수신된 입력 데이터들에 기초하여, 전력 수요량 예측 모델을 학습하고, 학습된 전력 수요량 예측 모델을 통하여 미래의 전력 수요량을 예측할 수 있다.Referring to FIG. 8 , the power demand prediction unit 500 receives normal data and restored data on power consumption from the outlier restoration unit 400, and first and second weather data (from the data preprocessor 200). Data_W1 and Data_W2), categorical time data (Data_T), and periodic time data (Data_T1 and Data_T2) may be received. The power demand prediction unit 500 may learn a power demand prediction model based on the received input data, and predict future power demand through the learned power demand prediction model.

본 출원의 일 실시 예에 있어서, 전력 수요 예측부(500)는 슬라이딩 윈도우(sliding window) 기반의 Light GBM 모델을 통하여 구현될 수 있다. 슬라이딩 윈도우 기법을 적용함으로써, 전력 수요 예측부(500)는 최신 추세 및 패턴을 적절하게 반영할 수 있다. In one embodiment of the present application, the power demand prediction unit 500 may be implemented through a sliding window-based Light GBM model. By applying the sliding window technique, the power demand forecasting unit 500 may appropriately reflect the latest trends and patterns.

예를 들어, 도 8에서, 점들은 각각 하루치의 입력 데이터를 의미할 수 있다. 이 경우, 전력 수요 예측부(500)는 예측하고자 하는 시점으로부터 이전 일주일의 입력 데이터를 사용하기 때문에, 최신 데이터를 반영할 수 있어 좋은 성능을 가질 수 있다. For example, in FIG. 8 , each dot may mean one day's worth of input data. In this case, since the power demand forecasting unit 500 uses input data of one week prior to the point in time to be predicted, it can reflect the latest data and thus have good performance.

본 출원의 일 실시 예에 있어서, 전력 수요 예측부(500)는 윈도우 사이즈(window size)로 '7'을 설정할 수 있다. 도 9를 참조하면, 윈도우 사이즈가 '1'에서부터 '7'로 증가하는 동안에는 그 성능이 향상되지만, 윈도우 사이즈가 '7'을 초과하는 경우부터는 성능 향상은 미미한 반면에 학습 시간만 늘어나는 것을 확인할 수 있다. 이는 전력 수요량에 대한 데이터가 일정 부분 주기적 패턴으로 반복되는 경향을 보이기 때문이다. 즉, 예측하고자 하는 시점이 월요일인 경우, 입력 변수로 지난주 월요일의 데이터를 입력하면 주기적 패턴을 반영할 수 있어서 성능이 향상될 수 있다. 본 출원의 실시 예에 따른 전력 수요 예측부(500)는 전력 수요량에 대한 이러한 주기적 패턴을 반영하기 위하여, 윈도우 사이즈를 '7'로 설정한다.In one embodiment of the present application, the power demand predictor 500 may set '7' as a window size. Referring to FIG. 9, while the window size increases from '1' to '7', the performance improves, but when the window size exceeds '7', the performance improvement is insignificant, while only the learning time increases. there is. This is because data on power demand tends to be repeated in a certain periodic pattern. That is, if the time point to be predicted is Monday, if the data of last week's Monday is input as an input variable, a periodic pattern can be reflected and performance can be improved. The power demand estimation unit 500 according to an embodiment of the present application sets the window size to '7' to reflect this periodic pattern for the amount of power demand.

한편, 슬라이딩 윈도우 기법을 적용하는 경우, 예측하고자 하는 시점마다 모델을 새롭게 구성해야 하는 단점이 존재한다. 본 출원의 실시 예에 따른 전력 수요 예측부(500)는 이러한 단점을 보완하기 위하여 모델 구성 속도가 빠르면서도 예측 성능이 뛰어난 Light GBM 모델을 사용하여 구현될 수 있다. 이 경우, 전력 수요 예측부(500)는 데이터 중 기울기가 큰 부분만을 사용하여 정보를 얻는 GOSS(Gradient-based One Side Sampling) 기법과 상호 배타적 변수들을 묶어서 처리하는 EFB(Exclusive Featuree Bundling) 기법을 사용하여 모델의 구성 속도를 좀 더 빠르게 할 수 있다. On the other hand, when the sliding window technique is applied, there is a disadvantage in that a model must be newly constructed at each point in time to be predicted. The power demand prediction unit 500 according to an embodiment of the present application may be implemented using a light GBM model with high model construction speed and excellent prediction performance in order to compensate for these disadvantages. In this case, the power demand forecasting unit 500 uses a Gradient-based One Side Sampling (GOSS) technique for obtaining information using only a portion of the data with a large gradient and an EFB (Exclusive Feature Bundling) technique for processing mutually exclusive variables. This can speed up the construction of the model.

도 10은 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)의 예측 결과를 다른 모델과 비교하는 그래프이다. 10 is a graph comparing prediction results of the power demand prediction apparatus 10 according to an embodiment of the present application with other models.

도 10에 도시된 바와 같이, 서로 다른 클러스터 4개에 대한 전력 수요를 예측한 결과, 전력 수요량 예측 장치(10)의 MAPE(Mean Absolute Percentage Error)는 각각 4.545%, 3.755%, 2.7%, 2.144%를 기록하였으며, 다른 모델에 비하여 뛰어난 예측 성능을 갖는 것을 확인할 수 있다.As shown in FIG. 10, as a result of predicting power demand for four different clusters, the mean absolute percentage error (MAPE) of the power demand prediction device 10 was 4.545%, 3.755%, 2.7%, and 2.144%, respectively. was recorded, and it can be confirmed that it has excellent predictive performance compared to other models.

도 11은 도 1의 전력 수요량 예측 장치(10)의 동작을 보여주는 순서도이다.FIG. 11 is a flowchart showing the operation of the apparatus 10 for predicting power demand amount of FIG. 1 .

S110 단계에서, 데이터 수집부(100)는 기상 데이터, 시간 데이터 및 전력 데이터를 수집할 수 있다. In step S110, the data collection unit 100 may collect weather data, time data, and power data.

S120 단계에서, 데이터 전처리부(200)는 수집된 기상 데이터, 시간 데이터 및 전력 데이터에 대한 전처리 동작을 수행할 수 있다. 예를 들어, 데이터 전처리부(200)는 기상 데이터 및 전력 데이터에 대한 정규화 동작을 수행할 수 있다. 예를 들어, 데이터 전처리부(200)는 1차원의 시간 데이터를 2차원의 시간 데이터로 변환할 수 있다. In step S120 , the data preprocessing unit 200 may perform a preprocessing operation on the collected weather data, time data, and power data. For example, the data preprocessor 200 may perform a normalization operation on weather data and power data. For example, the data pre-processing unit 200 may convert one-dimensional time data into two-dimensional time data.

S130 단계에서, 이상치 탐지부(300)는 전처리된 전력 데이터 중 이상치를 탐지하는 동작을 수행할 수 있다. 예를 들어, 이상치 탐지부(300)는 전력 데이터를 이상 데이터와 정상 데이터로 분류할 수 있으며, 이상치 탐지부(300)는 변분 오토인코더(VAE)를 통하여 구현될 수 있다.In step S130, the outlier detection unit 300 may perform an operation of detecting an outlier among the preprocessed power data. For example, the outlier detection unit 300 may classify power data into abnormal data and normal data, and the outlier detection unit 300 may be implemented through a variational autoencoder (VAE).

S140 단계에서, 이상치 복원부(400)는 이상 데이터를 복원할 수 있다. 예를 들어, 이상치 복원부(400)는 랜덤 포레스트 모델을 통하여 이상 데이터를 복원하여 복원 데이터를 생성할 수 있다. In step S140, the abnormal value restoration unit 400 may restore abnormal data. For example, the outlier restoration unit 400 may generate restoration data by restoring abnormal data through a random forest model.

S150 단계에서, 전력 수요 예측부(500)는 복원 데이터를 포함하는 입력 데이터에 기초하여 예측 모델을 학습할 수 있다. 예를 들어, 전력 수요 예측부(500)는 슬라이딩 윈도우 기반 LightGBM을 통하여 구현될 수 있다. 학습된 예측 모델을 통하여, 전력 수요 예측부(500)는 미래 시점의 전력 수요량을 정확하게 예측할 수 있다. In step S150, the power demand predictor 500 may learn a predictive model based on input data including restored data. For example, the power demand estimation unit 500 may be implemented through a sliding window based LightGBM. Through the learned prediction model, the power demand predictor 500 can accurately predict the amount of power demand in the future.

이상에서는 본 발명에 따른 바람직한 실시 예들에 대하여 도시하고 또한 설명하였다. 그러나 본 발명은 상술한 실시 예에 한정되지 아니하며, 특허 청구의 범위에서 첨부하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능할 것이다.In the above, preferred embodiments according to the present invention have been shown and described. However, the present invention is not limited to the above-described embodiments, and various modifications can be made by anyone having ordinary knowledge in the technical field to which the present invention belongs without departing from the gist of the present invention appended within the scope of the claims. .

100: 데이터 수집부
200: 데이터 전처리부
300: 이상치 탐지부
400: 이상치 복원부
500: 전력 수요 예측부100: data collection unit
200: data pre-processing unit
300: outlier detection unit
400: outlier restoration unit
500: power demand forecasting unit

Claims

a data collection unit that collects weather data, power data, and time data;
a data preprocessing unit performing a preprocessing operation on the weather data, the power data, and the time data;
an anomaly detection unit implemented through a variational autoencoder and classifying the preprocessed power data into abnormal data and normal data;
an outlier restoration unit implemented using a random forest model and restoring the abnormal data to generate restored data; and
An apparatus for predicting power demand, which is implemented using a sliding window-based LightGBM model, and includes a power demand prediction unit that learns a prediction model based on the normal data and the restored data and predicts power demand.

According to claim 1,
The data pre-processing unit
a weather data pre-processing unit performing a normalization operation on the weather data;
a power data pre-processing unit performing a normalization operation on the power data; and
and a time data pre-processing unit that converts the time data into two-dimensional time data.

According to claim 2,
The meteorological data pre-processing unit generates felt temperature data and discomfort index data based on temperature, humidity, and wind speed data among the meteorological data, and performs a normalization operation on the generated felt temperature data and discomfort index data. .

According to claim 2,
The time data pre-processing unit converts the time data into two different two-dimensional time data using a periodic function.

According to claim 1,
The weather data collected by the data collection unit is any one of weather forecast data and measured weather data, strategic demand forecasting device.

According to claim 1,
The power demand prediction unit has a window size of 7, power demand prediction device.

Collecting weather data, time data, and power data in a data collection unit;
performing a pre-processing operation on the weather data, the time data, and the power data in a data pre-processing unit;
detecting an outlier among the preprocessed power data in an outlier detection unit;
restoring the outliers in an outlier restoring unit to generate restoration data; and
In a power demand prediction unit, learning a prediction model using the restored data as learning data, and predicting power demand based on the learned prediction model.

According to claim 7,
The step of performing the preprocessing operation is
normalizing the meteorological data and the power data; and
and converting the time data into two different time data using a periodic function.

According to claim 8,
Normalizing the meteorological data
generating sensory temperature data and discomfort index data based on temperature, humidity, and wind speed data among the meteorological data; and
And performing a normalization operation on the generated sensory temperature data and the generated discomfort index data.

According to claim 9,
The weather data is any one of weather forecast data and measured weather data, power demand prediction method.

According to claim 7,
Wherein the outlier detection unit is implemented through a variational autoencoder.

According to claim 7,
Wherein the outlier restoration unit is implemented through a random forest model.

According to claim 7,
Wherein the power demand prediction unit is implemented through a sliding window-based LightGBM.

According to claim 13,
Wherein the power demand prediction unit has a window size of 7.