KR102593144B1

KR102593144B1 - Anomaly detiection and repair based electrical load forecasting device and method

Info

Publication number: KR102593144B1
Application number: KR1020210177250A
Authority: KR
Inventors: 황인준; 박성우; 정승민; 정승원
Original assignee: 고려대학교 산학협력단
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2023-10-23
Also published as: KR20230088967A

Abstract

본 발명은 전략 수요량을 예측할 수 있는 장치 및 방법에 관한 것으로, 좀 더 자세하게는 기계학습을 통하여 미래의 전략 수요를 정확하게 예측할 수 있는 장치 및 방법에 관한 것이다. 본 출원의 실시 예에 따른 전력 수요량 예측 장치는 기상 데이터, 전력 데이터 및 시간 데이터를 수집하는 데이터 수집부; 상기 기상 데이터, 상기 전력 데이터 및 상기 시간 데이터에 대한 전처리 동작을 수행하는 데이터 전처리부; 변분 오토인코더를 통하여 구현되며, 상기 전처리된 전력 데이터를 이상 데이터와 정상 데이터로 분류하는 이상치 탐지부; 랜덤 포레스트 모델을 통하여 구현되며, 상기 이상 데이터를 복원하여 복원 데이터를 생성하는 이상치 복원부; 및 슬라이딩 윈도우 기반의 LightGBM 모델을 통하여 구현되며, 상기 정상 데이터와 상기 복원 데이터에 기초하여 예측 모델을 학습하고 전력 수요량을 예측하는 전력 수요 예측부를 포함한다. 본 출원의 실시 예에 따른 전력 수요량 예측 장치는 이상치를 탐지하고 이를 복원하여 학습 데이터로 사용한다. 데이터 수가 충분하지 않은 상황에서 이상치를 복원하여 학습 데이터로 사용함으로써, 본 출원의 실시 예에 따른 전략 수요량 예측 장치는 오버 피팅 없이 향상된 예층 성능을 제공할 수 있다. The present invention relates to an apparatus and method for predicting strategic demand, and more specifically, to an apparatus and method for accurately predicting future strategic demand through machine learning. The power demand prediction device according to an embodiment of the present application includes a data collection unit that collects weather data, power data, and time data; a data pre-processing unit that performs pre-processing operations on the weather data, the power data, and the time data; An outlier detection unit implemented through a variational autoencoder and classifying the preprocessed power data into abnormal data and normal data; An outlier restoration unit implemented through a random forest model and restoring the abnormal data to generate restored data; and a power demand prediction unit that is implemented through a sliding window-based LightGBM model and learns a prediction model based on the normal data and the restored data and predicts power demand. The power demand prediction device according to an embodiment of the present application detects outliers, restores them, and uses them as learning data. By restoring outliers and using them as learning data in situations where the number of data is insufficient, the strategic demand prediction device according to the embodiment of the present application can provide improved forecast performance without overfitting.

Description

Anomaly detection and restoration based power demand prediction device and method {ANOMALY DETIECTION AND REPAIR BASED ELECTRICAL LOAD FORECASTING DEVICE AND METHOD}

본 발명은 전략 수요량을 예측할 수 있는 장치 및 방법에 관한 것으로, 좀 더 자세하게는 기계학습을 통하여 미래의 전략 수요를 정확하게 예측할 수 있는 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for predicting strategic demand, and more specifically, to an apparatus and method for accurately predicting future strategic demand through machine learning.

스마트 그리드는 전 세계적으로 발생하는 환경문제 및 자원고갈 문제에 대한 실현 가능한 솔루션으로 많은 주목을 받고 있으며, 스마트 그리드를 구성하는 에너지 저장 시스템, 에너지 관리 시스템, 신재생 에너지 시스템 등 다양한 시스템들을 효율적으로 사용하기 위해서는 정확도 높은 전력 수요 예측이 필요하다.Smart grid is attracting a lot of attention as a feasible solution to environmental problems and resource depletion problems occurring around the world, and various systems such as energy storage systems, energy management systems, and renewable energy systems that make up the smart grid are efficiently used. To do this, highly accurate power demand forecasting is required.

최근 컴퓨터 기술이 발전함에 따라 기계학습 및 딥러닝 기반의 예측 모델들에 대한 연구가 활발하게 진행되고 있으며 좋은 성능을 보여주고 있다. 이러한 기계학습 및 딥러닝 기반의 예측 모델들은 데이터의 양과 질의 영향을 많이 받는다.As computer technology has recently developed, research on machine learning and deep learning-based prediction models is actively underway and is showing good performance. These machine learning and deep learning-based prediction models are greatly influenced by the quantity and quality of data.

예측 모델의 학습에 사용되는 데이터에 이상치 혹은 결측치가 많이 존재할 경우 모델의 학습에 방해가 되어 예측 정확도가 낮아질 수 있다. 학습하기에 충분한 양의 데이터가 수집된 경우에는 이상치와 결측치를 제거한 후에 모델을 학습하여도 문제가 되지 않지만, 데이터 수가 충분하지 않을 때 이상치와 결측치를 제거하게 되면 오버 피팅으로 인하여 모델의 학습 자체가 어려워질 수 있다.If there are many outliers or missing values in the data used to train the prediction model, it may interfere with model learning and lower prediction accuracy. When a sufficient amount of data has been collected for learning, there is no problem in learning the model after removing outliers and missing values. However, if the number of outliers and missing values is removed when the number of data is insufficient, the model learning itself may be hindered due to overfitting. It can get difficult.

본 발명은 데이터 수가 충분하지 않은 상황에서 이상치 탐지 및 복원을 통하여 오버 피팅 없이 예측 모델의 성능을 향상시킬 수 있는 전략 수요량 예측 장치를 제공하는데 있다. The purpose of the present invention is to provide a strategic demand forecasting device that can improve the performance of a prediction model without overfitting through outlier detection and restoration in situations where the number of data is insufficient.

본 출원의 실시 예에 따른 전력 수요량 예측 장치는 기상 데이터, 전력 데이터 및 시간 데이터를 수집하는 데이터 수집부; 상기 기상 데이터, 상기 전력 데이터 및 상기 시간 데이터에 대한 전처리 동작을 수행하는 데이터 전처리부; 변분 오토인코더를 통하여 구현되며, 상기 전처리된 전력 데이터를 이상 데이터와 정상 데이터로 분류하는 이상치 탐지부; 랜덤 포레스트 모델을 통하여 구현되며, 상기 이상 데이터를 복원하여 복원 데이터를 생성하는 이상치 복원부; 및 슬라이딩 윈도우 기반의 LightGBM 모델을 통하여 구현되며, 상기 정상 데이터와 상기 복원 데이터에 기초하여 예측 모델을 학습하고 전력 수요량을 예측하는 전력 수요 예측부를 포함한다.The power demand prediction device according to an embodiment of the present application includes a data collection unit that collects weather data, power data, and time data; a data pre-processing unit that performs pre-processing operations on the weather data, the power data, and the time data; An outlier detection unit implemented through a variational autoencoder and classifying the preprocessed power data into abnormal data and normal data; An outlier restoration unit implemented through a random forest model and restoring the abnormal data to generate restored data; and a power demand prediction unit that is implemented through a sliding window-based LightGBM model and learns a prediction model based on the normal data and the restored data and predicts power demand.

실시 예에 있어서, 상기 데이터 전처리부는 상기 기상 데이터에 대한 정규화 동작을 수행하는 기상 데이터 전처리부; 상기 전력 데이터에 대한 정규화 동작을 수행하는 전력 데이터 전처리부; 및 상기 시간 데이터를 2차원의 시간 데이터로 변환하는 시간 데이터 전처리부를 포함한다.In an embodiment, the data preprocessor may include a weather data preprocessor that performs a normalization operation on the weather data; A power data preprocessor that performs a normalization operation on the power data; and a time data preprocessor that converts the time data into two-dimensional time data.

실시 예에 있어서, 상기 기상 데이터 전처리부는 상기 기상 데이터 중 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성하고, 생성된 체감 온도 데이터와 불쾌지수 데이터에 대한 정규화 동작을 수행한다.In an embodiment, the weather data preprocessor generates perceived temperature data and discomfort index data based on temperature, humidity, and wind speed data among the weather data, and performs a normalization operation on the generated perceived temperature data and discomfort index data. .

실시 예에 있어서, 상기 시간 데이터 전처리부는 주기 함수를 이용하여 상기 시간 데이터를 서로 다른 두 개의 2차원 시간 데이터로 변환한다.In an embodiment, the time data preprocessor converts the time data into two different two-dimensional time data using a periodic function.

실시 예에 있어서, 상기 데이터 수집부가 수집하는 기상 데이터는 기상 예보 데이터 및 측정 기상 데이터 중 어느 하나이다.In an embodiment, the weather data collected by the data collection unit is either weather forecast data or measured weather data.

실시 예에 있어서, 상기 전력 수요 예측부는 7의 윈도우 사이즈를 갖는다.In an embodiment, the power demand prediction unit has a window size of 7.

본 출원의 실시 예에 따른 전력 수요량 예측 방법은 데이터 수집부에서, 기상 데이터, 시간 데이터 및 전력 데이터를 수집하는 단계; 데이터 전처리부에서, 상기 기상 데이터, 상기 시간 데이터 및 상기 전력 데이터에 대한 전처리 동작을 수행하는 단계; 이상치 탐지부에서, 상기 전처리된 전력 데이터 중 이상치를 탐지하는 단계; 이상치 복원부에서, 상기 이상치를 복원하여 복원 데이터를 생성하는 단계; 및 전력 수요 예측부에서, 상기 복원 데이터를 학습 데이터로 하여 예측 모델을 학습하고, 상기 학습된 예측 모델에 기초하여 전력 수요량을 예측하는 단계를 포함한다.The power demand prediction method according to an embodiment of the present application includes collecting weather data, time data, and power data in a data collection unit; In a data preprocessing unit, performing a preprocessing operation on the weather data, the time data, and the power data; In an outlier detection unit, detecting outliers among the preprocessed power data; In an outlier restoration unit, restoring the outliers to generate restored data; and, in a power demand prediction unit, learning a prediction model using the restored data as learning data, and predicting power demand based on the learned prediction model.

실시 예에 있어서, 상기 전처리 동작을 수행하는 단계는 상기 기상 데이터 및 상기 전력 데이터를 정규화하는 단계; 및 상기 시간 데이터를 주기함수를 이용하여 서로 다른 두 개의 시간 데이터로 변환하는 단계를 포함한다.In an embodiment, performing the preprocessing operation may include normalizing the weather data and the power data; and converting the time data into two different time data using a periodic function.

실시 예에 있어서, 상기 기상 데이터를 정규화하는 단계는 상기 기상 데이터 중 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성하는 단계; 및 상기 생성된 체감 온도 데이터와 상기 생성된 불쾌지수 데이터에 대한 정규화 동작을 수행하는 단계를 포함한다.In an embodiment, normalizing the weather data includes generating perceived temperature data and discomfort index data based on temperature, humidity, and wind speed data among the weather data; and performing a normalization operation on the generated perceived temperature data and the generated discomfort index data.

실시 예에 있어서, 상기 기상 데이터는 기상 예보 데이터 및 측정 기상 데이터 중 어느 하나이다.In an embodiment, the weather data is either weather forecast data or measured weather data.

실시 예에 있어서, 상기 이상치 탐지부는 변분 오토인코더를 통하여 구현된다.In an embodiment, the outlier detection unit is implemented through a variational autoencoder.

실시 예에 있어서, 상기 이상치 복원부는 랜덤 포레스트 모델을 통하여 구현된다.In an embodiment, the outlier restoration unit is implemented through a random forest model.

실시 예에 있어서, 상기 전력 수요량 예측부는 슬라이딩 윈도우 기반의 LightGBM을 통하여 구현된다.In an embodiment, the power demand prediction unit is implemented through sliding window-based LightGBM.

실시 예에 있어서, 상기 전력 수요량 예측부는 7의 윈도우 사이즈를 갖는다.In an embodiment, the power demand prediction unit has a window size of 7.

본 출원의 실시 예에 따른 전력 수요량 예측 장치는 이상치를 탐지하고 이를 복원하여 학습 데이터로 사용한다. 데이터 수가 충분하지 않은 상황에서 이상치를 복원하여 학습 데이터로 사용함으로써, 본 출원의 실시 예에 따른 전략 수요량 예측 장치는 오버 피팅 없이 향상된 예층 성능을 제공할 수 있다. The power demand prediction device according to an embodiment of the present application detects outliers, restores them, and uses them as learning data. By restoring outliers and using them as learning data in situations where the number of data is insufficient, the strategic demand prediction device according to the embodiment of the present application can provide improved forecast performance without overfitting.

도 1은 본 출원의 실시 예에 따른 전략 수요량 예측 장치(10)를 보여주는 블록도이다.
도 2는 도 1의 데이터 전처리부(200)를 좀 더 자세히 보여주는 블록도이다.
도 3a는 범주형 시간 데이터(Data_T)의 일 예를 보여주는 도면이다.
도 3b는 2차원 시간 데이터(Data_T1, Data_T2)의 일 예를 보여주는 도면이다.
도 4는 도 1의 이상치 탐지부(300)를 좀 더 자세히 보여주는 도면이다.
도 5는 이상치 탐지부(300)의 이상치 탐지 실험 결과를 보여주는 그래프이다.
도 6은 도 1의 이상치 복원부(400)를 좀 더 자세히 보여주는 도면이다.
도 7은 이상치 복원부(400)의 복원 실험 결과를 보여주는 그래프이다.
도 8은 도 1의 전력 수요 예측부(500)를 좀 더 자세히 보여주는 도면이다.
도 9는 전력 수요 예측부(500)의 윈도우 사이즈 결정을 위한 실험 결과를 보여주는 그래프이다.
도 10은 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)의 예측 결과를 다른 모델과 비교하는 그래프이다.
도 11은 도 1의 전력 수요량 예측 장치(10)의 동작을 보여주는 순서도이다.Figure 1 is a block diagram showing a strategic demand prediction device 10 according to an embodiment of the present application.
FIG. 2 is a block diagram showing the data preprocessing unit 200 of FIG. 1 in more detail.
FIG. 3A is a diagram showing an example of categorical time data (Data_T).
FIG. 3B is a diagram showing an example of two-dimensional time data (Data_T1 and Data_T2).
FIG. 4 is a diagram showing the outlier detection unit 300 of FIG. 1 in more detail.
Figure 5 is a graph showing the results of an outlier detection experiment by the outlier detection unit 300.
FIG. 6 is a diagram showing the outlier restoration unit 400 of FIG. 1 in more detail.
Figure 7 is a graph showing the results of a restoration experiment of the outlier restoration unit 400.
FIG. 8 is a diagram showing the power demand prediction unit 500 of FIG. 1 in more detail.
Figure 9 is a graph showing the results of an experiment for determining the window size of the power demand prediction unit 500.
Figure 10 is a graph comparing the prediction results of the power demand prediction device 10 according to an embodiment of the present application with other models.
FIG. 11 is a flowchart showing the operation of the power demand prediction device 10 of FIG. 1.

이하에서는, 본 출원의 기술 분야에서 통상의 지식을 가진 자가 본 출원의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 출원의 실시 예들이 첨부된 도면을 참조하여 좀 더 자세히 설명될 것이다.Hereinafter, in order to explain in detail enough to enable a person skilled in the art of the present application to easily implement the technical idea of the present application, embodiments of the present application will be described in more detail with reference to the accompanying drawings. will be.

도 1은 본 출원의 실시 예에 따른 전략 수요량 예측 장치(10)를 보여주는 블록도이다.Figure 1 is a block diagram showing a strategic demand prediction device 10 according to an embodiment of the present application.

도 1을 참조하면, 전략 수요량 예측 장치(10)는 데이터 수집부(100), 데이터 전처리부(200), 이상치 탐지부(300), 이상치 복원부(400) 및 전력수요 예측부(500)를 포함한다. Referring to FIG. 1, the strategic demand prediction device 10 includes a data collection unit 100, a data preprocessing unit 200, an outlier detection unit 300, an outlier restoration unit 400, and a power demand prediction unit 500. Includes.

데이터 수집부(100)는 전력 수요량 예측에 필요한 각종 데이터를 수집한다. 예를 들어, 데이터 수집부(100)는 기상 데이터, 전력 데이터 및 시간 데이터를 수집할 수 있다. The data collection unit 100 collects various data necessary for predicting power demand. For example, the data collection unit 100 may collect weather data, power data, and time data.

예를 들어, 데이터 수집부(100)는 외부로부터 각종 기상에 대한 데이터를 수집할 수 있다. 데이터 수집부(100)가 수집하는 기상 데이터는, 예를 들어, 일 최고 기온, 일 최저 기온, 기온, 습도, 풍속, 전운량, 강수량에 대한 데이터일 수 있다. 데이터 수집부(100)는, 예를 들어, 기상청의 기상자료개방포털 등을 통하여 기상 데이터를 수집할 수 있다. For example, the data collection unit 100 may collect data about various types of weather from the outside. The meteorological data collected by the data collection unit 100 may be, for example, data on daily maximum temperature, daily minimum temperature, temperature, humidity, wind speed, total cloud cover, and precipitation. The data collection unit 100 may collect meteorological data through, for example, the Korea Meteorological Administration's meteorological data open portal.

예를 들어, 데이터 수집부(100)는 적어도 하나의 클러스터로부터 소모한 전력량에 대한 전력 데이터를 수집할 수 있다. 이 경우, 클러스터는 데이터의 편향을 방지하기 위하여 서로 다른 용도의 건물일 수 있다. 예를 들어, 데이터 수집부(100)는 교육용 건물들로 이루어진 클러스터 A, 기숙사들로 이루어진 클러스터 B, 공과대학 연구실로 이루어진 클러스터 C 및 이과대학 연구실로 이루어진 클러스터 D로부터 전력 데이터를 수집할 수 있다. 다만, 이는 예시적인 것이며, 클러스터의 개수 및 종류는 다양하게 설정될 수 있다.For example, the data collection unit 100 may collect power data about the amount of power consumed from at least one cluster. In this case, clusters may be buildings for different purposes to prevent data bias. For example, the data collection unit 100 may collect power data from cluster A comprised of educational buildings, cluster B comprised of dormitories, cluster C comprised of engineering college labs, and cluster D comprised of science college labs. However, this is an example, and the number and type of clusters can be set in various ways.

예를 들어, 데이터 수집부(100)는 각종 기상 및 전력 데이터를 수집할 때에, 이에 대응하는 시간 데이터를 함께 수집할 수 있다. 예를 들어, 데이터 수집부(100)는 기상 데이터를 수집할 때에 월(month), 일(day), 시(hour), 분(minute)에 대한 시간 데이터를 함께 수집할 수 있다. 또한, 데이터 수집부(100)는 전력 데이터를 수집할 때에 월(month), 일(day), 시(hour), 분(minute)에 대한 시간 데이터를 함께 수집할 수 있다.For example, when collecting various weather and power data, the data collection unit 100 may also collect corresponding time data. For example, when collecting weather data, the data collection unit 100 may also collect time data for month, day, hour, and minute. Additionally, when collecting power data, the data collection unit 100 may also collect time data for month, day, hour, and minute.

본 출원의 일 실시 예에 있어서, 데이터 수집부(100)는 실제로 측정된 기상 데이터 뿐만 아니라, 과거 시점으로부터 소정 기간 전에 예보된 기상 예보 데이터를 함께 수집할 수 있다. 예를 들어, 2018년 5월 5일에 소모된 전력 데이터 및 기상 데이터를 수집할 때에, 해당 2018년 5월 5일에 대응하는 기상 데이터는 하루 전인 2018년 5월 4일에 예보된 일 최소 기온, 일 최고 기온, 일 평균기온, 기온, 습도, 풍속, 전운량, 강수량 등에 대한 기상 예보 데이터일 수 있다. 이와 같이, 실제 측정된 기상 데이터 뿐만 아니라 기상 예보 데이터를 함께 수집하고, 이를 학습 모델 구현을 위한 학습 데이터로 제공함으로써, 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)는 기상 예보 데이터와 실제 기상 데이터 사이의 오차까지 함께 고려하여 학습 동작을 수행할 수 있다.In one embodiment of the present application, the data collection unit 100 may collect not only actually measured weather data but also weather forecast data predicted a predetermined period of time from the past. For example, when collecting power consumption data and weather data on May 5, 2018, the weather data corresponding to May 5, 2018 is the minimum daily temperature forecast on May 4, 2018, the day before. , it may be weather forecast data for daily maximum temperature, daily average temperature, temperature, humidity, wind speed, total cloud cover, precipitation, etc. In this way, by collecting not only the actual measured weather data but also the weather forecast data and providing this as learning data for implementing a learning model, the power demand prediction device 10 according to the embodiment of the present application uses the weather forecast data and the actual weather forecast data. Learning operations can be performed by considering errors between weather data.

다만, 이는 예시적인 것이며, 데이터 수집부(100)는 실제로 측정된 기상 데이터만을 수집할 수도 있으며, 이 경우에 학습 동작은 실제 측정된 기상 데이터만을 이용하여 수행될 수 있다. 다른 예로, 데이터 수집부(100)는 기상 예보 데이터만을 수집할 수 있으며, 이 경우에 학습 동작은 기상 예보 데이터만을 이용하여 수행될 수도 있다.However, this is an example, and the data collection unit 100 may collect only actually measured weather data. In this case, the learning operation may be performed using only the actually measured weather data. As another example, the data collection unit 100 may collect only weather forecast data, and in this case, the learning operation may be performed using only weather forecast data.

데이터 전처리부(200)는 데이터 수집부(100)로부터 기상 데이터, 시간 데이터 및 전력 데이터를 수신한다. 데이터 전처리부(200)는 이상 탐지부(300)에서 활용될 수 있도록 수신한 기상 데이터, 전력 데이터 및 시간 데이터에 대한 전처리 동작을 수행한다.The data pre-processing unit 200 receives weather data, time data, and power data from the data collection unit 100. The data preprocessing unit 200 performs a preprocessing operation on the received weather data, power data, and time data so that they can be used in the anomaly detection unit 300.

예를 들어, 데이터 전처리부(200)는 데이터 수집부(100)로부터 기상 데이터 및 전력 데이터를 수신하고, 수신한 기상 데이터 및 전력 데이터에 대한 정규화 동작을 수행할 수 있다. 또한, 데이터 전처리부(200)는 데이터 수집부(100)로부터 1차원의 시간 데이터를 수신하고, 수신한 1차원의 시간 데이터를 2차원의 시간 데이터로 변환할 수 있다. 데이터 전처리부(200)의 구성 및 동작은 이하의 도 2 및 도 3에서 좀 더 자세히 설명될 것이다.For example, the data pre-processing unit 200 may receive weather data and power data from the data collection unit 100 and perform a normalization operation on the received weather data and power data. Additionally, the data pre-processing unit 200 may receive one-dimensional time data from the data collection unit 100 and convert the received one-dimensional time data into two-dimensional time data. The configuration and operation of the data pre-processing unit 200 will be described in more detail in FIGS. 2 and 3 below.

이상치 탐지부(300)는 데이터 전처리부(200)로부터 전처리된 전력 데이터를 수신한다. 이상치 탐지부(300)는 전처리된 전력 데이터에서 이상치를 탐지할 수 있다. The outlier detection unit 300 receives preprocessed power data from the data preprocessor 200. The outlier detection unit 300 may detect outliers in preprocessed power data.

본 출원의 일 실시 예에 있어서, 이상치 탐지부(300)는 변분 오토인코더(Variational Autoencoder, VAE)를 통하여 이상치를 탐지할 수 있다. 변분 오토인코더는 입력 값의 분포를 학습하여 출력 값을 생성하기 때문에, 이상치 탐지부(300)는 일반적인 전력 수요 분포에서 벗어난 이상치를 좀 더 잘 탐지할 수 있다. 이상치 탐지부(300)는 이하의 도 4 및 도 5에서 좀 더 자세히 설명될 것이다. In one embodiment of the present application, the outlier detection unit 300 may detect outliers through a variational autoencoder (VAE). Since the variational autoencoder generates output values by learning the distribution of input values, the outlier detection unit 300 can better detect outliers that deviate from the general power demand distribution. The outlier detection unit 300 will be described in more detail in FIGS. 4 and 5 below.

이상치 복원부(400)는 이상치 탐지부(300)로부터 이상 데이터를 수신한다. 이상치 복원부(400)는 이상 데이터에 대응하는 시점의 다른 입력 변수를 기반으로 하여, 이상 데이터를 복원하여 복원 데이터를 생성한다.The outlier restoration unit 400 receives abnormal data from the outlier detection unit 300. The outlier restoration unit 400 restores the abnormal data based on other input variables at the time corresponding to the abnormal data and generates restored data.

본 출원의 일 실시 예에 있어서, 이상치 복원부(400)는 랜덤 포레스트(RF) 모델을 통하여 이상치 데이터로부터 복원 데이터를 복원할 수 있다. 이상치 복원부(400)는 이하의 도 6 및 도 7에서 좀 더 자세히 설명될 것이다.In one embodiment of the present application, the outlier restoration unit 400 may restore restored data from outlier data through a random forest (RF) model. The outlier restoration unit 400 will be described in more detail in FIGS. 6 and 7 below.

전력수요 예측부(500)는 이상치 복원부(500)로부터 복원 데이터를 포함하는 입력 데이터를 수신한다. 여기서, 입력 데이터는 데이터 전처리부(200)에서 전처리된 데이터 및 이상치 복원부(400)에서 복원된 데이터를 포함한다. 전력수요 예측부(500)는 입력 데이터를 사용하여 전력수요량 예측을 위한 모델을 학습시킬 수 있다. The power demand prediction unit 500 receives input data including restored data from the outlier restoration unit 500. Here, the input data includes data preprocessed in the data preprocessing unit 200 and data restored in the outlier restoration unit 400. The power demand prediction unit 500 may learn a model for predicting power demand using input data.

본 출원의 일 실시 예에 있어서, 전력수요 예측부(500)는 슬라이딩 윈도우(sliding window) 기반의 Light GBM 모델을 통하여 구현될 수 있다. 전력수요 예측부(500)는 이하의 도 8 및 도 9에서 좀 더 자세히 설명될 것이다.In one embodiment of the present application, the power demand prediction unit 500 may be implemented through a sliding window-based Light GBM model. The power demand prediction unit 500 will be described in more detail in FIGS. 8 and 9 below.

상술한 바와 같이, 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)는 변분 오토인코더를 통하여 이상치를 탐지하고, 랜덤 포레스트 모델을 통하여 이상치를 복원한다. 따라서, 데이터 수가 충분하지 않은 상황에서도 안정적인 학습 데이터 제공이 가능하며, 이에 따라 전력 수요량 예측 장치(10)는 오버 피팅 없이 향상된 예측 성능을 제공할 수 있다. 아울러, 전력 수요량 예측 장치(10)는 슬라이딩 윈도우 기반의 Light GBM 모델을 통하여 예측 모델을 구현하며, 이에 따라 예측 시점과 가까운 최신 데이터 패턴을 적절히 반영할 수 있어서 예측 성능을 더욱 향상시킬 수 있다.As described above, the power demand prediction device 10 according to an embodiment of the present application detects outliers through a variational autoencoder and restores the outliers through a random forest model. Therefore, stable training data can be provided even in situations where the number of data is insufficient, and accordingly, the power demand prediction device 10 can provide improved prediction performance without overfitting. In addition, the power demand prediction device 10 implements a prediction model through a sliding window-based Light GBM model, and thus can properly reflect the latest data patterns close to the prediction time, thereby further improving prediction performance.

도 2는 도 1의 데이터 전처리부(200)를 좀 더 자세히 보여주는 블록도이다.FIG. 2 is a block diagram showing the data preprocessing unit 200 of FIG. 1 in more detail.

도 2를 참조하면, 데이터 전처리부(200)는 기상 데이터 전처리부(210), 전력 데이터 전처리부(220) 및 시간 데이터 전처리부(230)를 포함한다.Referring to FIG. 2, the data pre-processing unit 200 includes a meteorological data pre-processing unit 210, a power data pre-processing unit 220, and a time data pre-processing unit 230.

기상 데이터 전처리부(210)는 데이터 수집부(100)로부터 기상 데이터(Data_W)를 수신한다. 기상 데이터 전처리부(210)는 수신된 기상 데이터(Data_W)에 대한 전처리 동작을 수행하여 제1 기상 데이터(Data_W1)를 생성한다.The weather data pre-processing unit 210 receives weather data (Data_W) from the data collection unit 100. The weather data preprocessor 210 performs a preprocessing operation on the received weather data (Data_W) to generate first weather data (Data_W1).

예를 들어, 기상 데이터 전처리부(210)는 데이터 수집부(100)로부터 일 최고 기온, 일 최저 기온, 기온, 습도, 풍속, 전운량, 강수량 등의 데이터를 수신할 수 있다. 기상 데이터 전처리부(210)는 수신된 기상 데이터들에 대한 정규화 동작을 수행하는 것을 통해 제1 기상 데이터(Data_W1)를 생성할 수 있다. For example, the weather data pre-processing unit 210 may receive data such as daily maximum temperature, daily minimum temperature, temperature, humidity, wind speed, total cloud cover, and precipitation amount from the data collection unit 100. The weather data preprocessor 210 may generate first weather data (Data_W1) by performing a normalization operation on the received weather data.

또한, 기상 데이터 전처리부(210)는 수신된 기온, 습도, 풍속 데이터에 기초하여 체감 온도 데이터와 불쾌지수 데이터를 생성할 수 있다. 이후, 기상 데이터 전처리부(210)는 체감 온도 데이터와 불쾌지수 데이터를 정규화하여 제2 기상 데이터(Data_W2)를 생성할 수 있다.Additionally, the weather data preprocessor 210 may generate perceived temperature data and discomfort index data based on the received temperature, humidity, and wind speed data. Thereafter, the weather data preprocessor 210 may generate second weather data (Data_W2) by normalizing the perceived temperature data and discomfort index data.

전력 데이터 전처리부(220)는 데이터 수집부(100)로부터 적어도 하나의 클러스터에서 소모된 전력량에 대한 전력 데이터를 수신할 수 있다. 전력 데이터 전처리부(220)는 수신된 전력 데이터에 대한 전처리 동작을 수행하여 전처리된 전력 데이터(Data_Pp)를 생성할 수 있다.The power data preprocessor 220 may receive power data about the amount of power consumed in at least one cluster from the data collection unit 100. The power data preprocessor 220 may perform a preprocessing operation on the received power data to generate preprocessed power data (Data_Pp).

시간 데이터 전처리부(230)는 데이터 수집부(100)로부터 시간 데이터(Data_T)를 수신한다. 여기서, 시간 데이터는 범주형 데이터를 반영하는데 좋은 1차원 데이터일 수 있다. 시간 데이터 전처리부(230)는 1차원의 시간 데이터(Data_T)를 그대로 출력하거나, 1차원의 시간 데이터(Data_T)를 주기성 정보를 반영할 수 있는 2차원의 시간 데이터((Data_T1, Data_T2)로 변환할 수 있다.The time data pre-processing unit 230 receives time data (Data_T) from the data collection unit 100. Here, time data may be one-dimensional data that is good for reflecting categorical data. The time data preprocessor 230 outputs one-dimensional time data (Data_T) as is, or converts one-dimensional time data (Data_T) into two-dimensional time data ((Data_T1, Data_T2) that can reflect periodicity information. can do.

좀 더 자세히 설명하면, 1차원 시간 데이터는 1월, 2월과 같은 월(month)을 나타내는 범주형 정보, 1일, 2일과 같이 일(day)을 나타내는 범주형 정보 및 1시, 2시와 같이 시(hour)를 나타내는 범주형 정보는 잘 반영한다. 그러나, 1차원 시간 데이터는 주기성 정보는 잘 반영하지 못한다는 문제가 있다. 예를 들어, 23시와 0시는 연속적인 시간임에도 불구하고, 1차원 데이터 상으로는 23의 차이가 발생하게 된다.To explain in more detail, one-dimensional time data includes categorical information representing months such as January and February, categorical information representing days such as the 1st and 2nd, and 1 o'clock and 2 o'clock. Likewise, categorical information representing hours is reflected well. However, one-dimensional time data has a problem in that it does not reflect periodicity information well. For example, although 23:00 and 0:00 are continuous times, a difference of 23 occurs in one-dimensional data.

따라서, 시간의 주기성 정보가 잘 반영될 수 있도록, 시간 데이터 전처리부(230)는 1차원의 시간 데이터(Data_T)를 2차원의 시간 데이터((Data_T1, Data_T2)로 변환할 수 있다. 이때, 시간 데이터 전처리부(230)는 사인(sin) 함수와 코사인 함수(cos)와 같은 주기 함수를 통하여 변환 동작을 수행할 수 있다. 예를 들어, 시간 데이터 전처리부(230)는 다음의 수식을 통하여 1차원의 시간 데이터(Data_T)를 2차원의 시간 데이터(Data_T1, Data_T2)로 변환할 수 있다.Therefore, so that the periodicity information of time can be well reflected, the time data preprocessor 230 can convert one-dimensional time data (Data_T) into two-dimensional time data ((Data_T1, Data_T2). At this time, time The data pre-processing unit 230 may perform a conversion operation through periodic functions such as a sine function and a cosine function (cos). For example, the time data pre-processing unit 230 uses the following equation: 1 One-dimensional time data (Data_T) can be converted into two-dimensional time data (Data_T1, Data_T2).

여기서, "cycle"은 시간 데이터의 주기를 나타낸다. 예를 들어, "time"이 월(month) 데이터인 경우에 "cycle"은 "12"일 수 있고, "time"이 일(day) 데이터인 경우에 "cycle"은 해달 월의 일수(Day of the Month)일 수 있으며, "time"이 시(hour) 데이터인 경우에 "cycle"은 "24"일 수 있다. 또한, 여기서, ""와 ""는 각각 도 2의 Data_T1과 Data_T2에 대응할 수 있다.Here, “cycle” represents the cycle of time data. For example, if "time" is month data, "cycle" may be "12", and if "time" is day data, "cycle" may be the number of days in the month. the Month), and if “time” is hour data, “cycle” may be “24”. Also, here: " "and " " may correspond to Data_T1 and Data_T2 in FIG. 2, respectively.

한편, 시간 데이터 전처리부(230)가 2개의 삼각함수 값을 통해 2차원으로 표현하는 이유는, 예를 들어 주기가 12인 하나의 삼각함수 값을 통해 표현할 경우에는 두 개의 x값에 대해서 같은 y값이 결정되며, 이 경우에 y값만으로 시기를 특정하기 어렵기 때문이다. 따라서, 동일한 x값이라도 서로 다른 y값을 갖는 두 개의 삼각함수를 사용하여 이러한 문제를 해결할 수 있도록, 시간 데이터 전처리부(230)는 2개의 삼각함수를 이용하여 2차원 데이터(Data_T1, Data_T2)를 생성한다.Meanwhile, the reason why the time data preprocessor 230 expresses it in two dimensions through two trigonometric function values is that, for example, when expressing through one trigonometric function value with a period of 12, the same y for two x values This is because the value is determined, and in this case, it is difficult to specify the time only with the y value. Therefore, in order to solve this problem by using two trigonometric functions with different y values even if the x value is the same, the time data preprocessor 230 uses two trigonometric functions to generate two-dimensional data (Data_T1, Data_T2). Create.

상술한 바와 같이, 데이터 전처리부(200)는 기상 데이터(Data_W), 전력 데이터(Data_P) 및 시간 데이터(Data_T)에 대한 전처리 동작을 수행할 수 있다. 특히, 데이터 전처리부(200)는 특히 범주형 정보를 잘 반영하는 1차원의 시간 데이터(Data_T)와 주기성 정보를 잘 반영하는 2차원의 시간 데이터(Data_T1, Data_T2)를 함께 출력할 수 있다.As described above, the data preprocessor 200 may perform preprocessing operations on weather data (Data_W), power data (Data_P), and time data (Data_T). In particular, the data preprocessor 200 can output one-dimensional time data (Data_T) that well reflects categorical information and two-dimensional time data (Data_T1, Data_T2) that well reflect periodic information.

도 3은 도 2의 시간 데이터 전처리부(230)에 의하여 출력되는 시간 데이터의 일 예를 보여주는 도면이다. 구체적으로, 도 3a는 범주형 시간 데이터(Data_T)의 일 예를 보여주며, 도 3b는 2차원 시간 데이터(Data_T1, Data_T2)의 일 예를 보여준다.FIG. 3 is a diagram showing an example of time data output by the time data preprocessor 230 of FIG. 2. Specifically, FIG. 3A shows an example of categorical temporal data (Data_T), and FIG. 3B shows an example of two-dimensional temporal data (Data_T1 and Data_T2).

도 3에 도시된 바와 같이, 시간 데이터 전처리부(230)는 시간 데이터(Data_T)에 대한 전처리 동작을 수행하여 2차원 시간 데이터(Data_T1, Data_T2)를 생성할 수 있다. As shown in FIG. 3, the time data preprocessor 230 may perform a preprocessing operation on time data Data_T to generate two-dimensional time data Data_T1 and Data_T2.

도 4는 도 1의 이상치 탐지부(300)를 좀 더 자세히 보여주는 도면이고, 도 5는 이상치 탐지부(300)의 이상치 탐지 실험 결과를 보여주는 그래프이다.FIG. 4 is a diagram showing the outlier detection unit 300 of FIG. 1 in more detail, and FIG. 5 is a graph showing the results of an outlier detection experiment of the outlier detection unit 300.

도 4를 참조하면, 이상치 탐지부(300)는 데이터 전처리부(200)로부터 전처리된 전력 데이터(Data_Pp)를 수신한다. 이상치 탐지부(300)는 전처리된 전력 데이터(Data_Pp)에서 이상치를 탐지하여, 정상 데이터(Normal Data)와 이상 데이터(Abnormal Data)로 구분할 수 있다.Referring to FIG. 4, the outlier detection unit 300 receives preprocessed power data (Data_Pp) from the data preprocessor 200. The outlier detection unit 300 can detect outliers in the preprocessed power data (Data_Pp) and distinguish them into normal data and abnormal data.

본 출원의 실시 예에 있어서, 이상치 탐지부(300)는 변분 오토인코더를 통하여 구현될 수 있다. 즉, 이상치 탐지부(300)는 전처리된 전력 데이터를 이용하여 변분 인코더를 학습하고, 학습된 변분 오토인코더를 통해 재구성된 출력 값과 입력 값의 차이를 이용하여 이상치를 탐지할 수 있다. In an embodiment of the present application, the outlier detection unit 300 may be implemented through a variational autoencoder. That is, the outlier detection unit 300 can learn a variational encoder using preprocessed power data and detect an outlier using the difference between the output value and the input value reconstructed through the learned variational autoencoder.

만약 이상치 탐지부가 일반 오토인코더(AE)를 통하여 구현한다면, 오토인코더 기반의 이상 탐지부는 모든 입력 변수를 재구성한 후에 각 입력 변수들의 재구성 오류를 모두 더한다. 이후, 오토인코더 기반의 이상 탐지부는 재구성 오류의 합이 일정 값 이상이 될 경우에 이상치라고 판단한다. 그러나, 이 경우, 실제 전력 소모량에 이상치가 발생한 경우가 아닐 때에도, 오토인코더 기반의 이상치 탐지부는 이상치라고 판단할 수 있는 위험이 있다. If the outlier detection unit is implemented through a general autoencoder (AE), the autoencoder-based anomaly detection unit reconstructs all input variables and then adds all the reconstruction errors of each input variable. Afterwards, the autoencoder-based anomaly detection unit determines that it is an outlier when the sum of reconstruction errors exceeds a certain value. However, in this case, even when an outlier does not occur in actual power consumption, there is a risk that the autoencoder-based outlier detection unit may determine it to be an outlier.

예를 들어, 여름철에 장마로 인하여 갑작스럽게 어느 한 시점에 비가 많이 내렸다고 가정하자. 이 경우, 오토인코더에 기반의 이상 탐지부는 모든 입력 변수들의 재구성 오류를 고려하기 때문에, 전력 사용량을 비롯하여 재구성된 입력 변수들이 정상임에도 불구하고 재구성된 강수량 관련 변수에서 차이가 많이 발생하여 결과적으로 이상치라고 잘못 판단할 수 있다.For example, let's assume that it suddenly rained heavily at a certain point due to the rainy season during the summer. In this case, because the autoencoder-based anomaly detection unit considers the reconstruction error of all input variables, even though the reconstructed input variables, including power usage, are normal, a large difference occurs in the reconstructed rainfall-related variables, resulting in an outlier. You may judge incorrectly.

이러한 오류 위험성을 방지하기 위하여, 본 출원의 실시 예에 따른 이상치 탐지부(300)는 변분 오토인코더를 통하여 구현될 수 있다. 이 경우, 모든 입력 변수의 재구성 오류를 더하는 것을 통해 이상치를 판단하는 것이 아니라, 전처리된 전력 데이터(Data_Pp)의 재구성 오류만을 이용하여 이상치가 판단되기 때문에, 본 출원의 실시 예에 따른 이상치 탐지부(300)는 좀 더 향상된 이상치 탐지 성능을 가질 수 있다. To prevent this risk of error, the outlier detection unit 300 according to an embodiment of the present application can be implemented through a variational autoencoder. In this case, rather than determining the outlier by adding the reconstruction errors of all input variables, the outlier is determined using only the reconstruction error of the preprocessed power data (Data_Pp), so the outlier detection unit ( 300) may have more improved outlier detection performance.

도 5를 참조하면, 본 출원의 실시 예에 따른 변분 오토인코더 기반의 이상치 탐지부(300)가 다른 모델에 비하여 좀 더 향상된 성능을 갖는 것을 확인할 수 있다. 도 5에서, IQR, IForest, LOF는 각각 사분위수 범위(interquartile range, IQR)를 활용한 이상치 탐지 모델, Isolation Forest 이상치 탐지 모델 및 Local Outlier Factor 이상치 탐지 모델을 의미한다.Referring to FIG. 5, it can be seen that the outlier detection unit 300 based on a variational autoencoder according to an embodiment of the present application has improved performance compared to other models. In Figure 5, IQR, IForest, and LOF mean an outlier detection model using the interquartile range (IQR), an Isolation Forest outlier detection model, and a Local Outlier Factor outlier detection model, respectively.

도 6은 도 1의 이상치 복원부(400)를 좀 더 자세히 보여주는 도면이고, 도 7은 이상치 복원부(400)의 복원 실험 결과를 보여주는 그래프이다.FIG. 6 is a diagram showing the outlier restoration unit 400 of FIG. 1 in more detail, and FIG. 7 is a graph showing the results of a restoration experiment of the outlier restoration unit 400.

도 6을 참조하면, 이상치 복원부(400)는 이상치 탐지부(300)로부터 정상 데이터(Normal Data)와 이상 데이터(Abnormal Data)를 수신한다. 이상치 복원부(400)는 이상 데이터가 탐지된 시점의 다른 입력 변수를 기반으로 하여 이상 데이터를 복원 데이터(Repair Data)로 복원할 수 있다.Referring to FIG. 6, the outlier restoration unit 400 receives normal data and abnormal data from the outlier detection unit 300. The outlier restoration unit 400 may restore abnormal data into repair data based on other input variables at the time the abnormal data was detected.

본 출원의 일 실시 예에 있어서, 이상치 복원부(400)는 랜덤 포레스트(RF) 모델을 사용하여 구현될 수 있다. 즉, 이상치 복원부(400)는 정상 데이터를 사용하여 랜덤 포레스트 모델을 학습하고, 학습된 랜덤 포레스트 모델에 이상치가 발생한 시점의 입력 변수들을 넣었을 때에 도출된 값으로 복원 데이터를 생성할 수 있다. 랜덤 포레스트 모델은 부분 데이터 셋을 사용하여 각각의 의사 결정 트리를 학습한다. 따라서, 이상치 복원부(400)는 오버 피팅이 발생하지 않을 뿐 아니라, 입력 변수가 많음에도 불구하고 향상된 성능을 가질 수 있다.In one embodiment of the present application, the outlier restoration unit 400 may be implemented using a random forest (RF) model. That is, the outlier restoration unit 400 can learn a random forest model using normal data and generate restored data with values derived when input variables at the time an outlier occurs are input into the learned random forest model. Random forest models use partial data sets to learn each decision tree. Accordingly, the outlier restoration unit 400 not only does not cause overfitting, but also can have improved performance despite the number of input variables.

도 7을 참조하면, 본 출원의 실시 예에 따른 랜덤 포레스트 기반의 이상치 복원부(400)가 다른 모델에 비하여 좀 더 향상된 성능을 갖는 것을 확인할 수 있다. 도 7에서, Zero, Linear, RF 는 각각 zero interpolation을 사용하여 이상치를 복원하였을 때, Linear interpolation을 사용하여 이상치를 복원하였을 때 및 Random Forest를 사용하여 이상치를 복원하였을 때를 나타낸다.Referring to FIG. 7, it can be seen that the random forest-based outlier restoration unit 400 according to an embodiment of the present application has improved performance compared to other models. In Figure 7, Zero, Linear, and RF indicate when an outlier is restored using zero interpolation, when an outlier is restored using linear interpolation, and when an outlier is restored using Random Forest, respectively.

도 8은 도 1의 전력 수요 예측부(500)를 좀 더 자세히 보여주는 도면이고, 도 9는 전력 수요 예측부(500)의 윈도우 사이즈 결정을 위한 실험 결과를 보여주는 그래프이다. FIG. 8 is a diagram showing the power demand prediction unit 500 of FIG. 1 in more detail, and FIG. 9 is a graph showing the results of an experiment for determining the window size of the power demand prediction unit 500.

도 8을 참조하면, 전력 수요 예측부(500)는 이상치 복원부(400)부로부터 전력 소모량에 대한 정상 데이터 및 복원 데이터를 수신하고, 데이터 전처리부(200)로부터 제1 및 제2 기상 데이터(Data_W1, Data_W2), 범주형 시간 데이터(Data_T) 및 주기성 시간 데이터(Data_T1, Data_T2)를 수신할 수 있다. 전력 수요 예측부(500)는 수신된 입력 데이터들에 기초하여, 전력 수요량 예측 모델을 학습하고, 학습된 전력 수요량 예측 모델을 통하여 미래의 전력 수요량을 예측할 수 있다.Referring to FIG. 8, the power demand prediction unit 500 receives normal data and restoration data on power consumption from the outlier restoration unit 400, and receives first and second weather data from the data preprocessor 200. Data_W1, Data_W2), categorical time data (Data_T), and periodic time data (Data_T1, Data_T2) can be received. The power demand prediction unit 500 may learn a power demand prediction model based on the received input data and predict future power demand through the learned power demand prediction model.

본 출원의 일 실시 예에 있어서, 전력 수요 예측부(500)는 슬라이딩 윈도우(sliding window) 기반의 Light GBM 모델을 통하여 구현될 수 있다. 슬라이딩 윈도우 기법을 적용함으로써, 전력 수요 예측부(500)는 최신 추세 및 패턴을 적절하게 반영할 수 있다. In one embodiment of the present application, the power demand prediction unit 500 may be implemented through a sliding window-based Light GBM model. By applying the sliding window technique, the power demand forecasting unit 500 can appropriately reflect the latest trends and patterns.

예를 들어, 도 8에서, 점들은 각각 하루치의 입력 데이터를 의미할 수 있다. 이 경우, 전력 수요 예측부(500)는 예측하고자 하는 시점으로부터 이전 일주일의 입력 데이터를 사용하기 때문에, 최신 데이터를 반영할 수 있어 좋은 성능을 가질 수 있다. For example, in FIG. 8, each dot may represent one day's worth of input data. In this case, since the power demand forecasting unit 500 uses input data from the previous week from the point in time to be predicted, it can reflect the latest data and have good performance.

본 출원의 일 실시 예에 있어서, 전력 수요 예측부(500)는 윈도우 사이즈(window size)로 '7'을 설정할 수 있다. 도 9를 참조하면, 윈도우 사이즈가 '1'에서부터 '7'로 증가하는 동안에는 그 성능이 향상되지만, 윈도우 사이즈가 '7'을 초과하는 경우부터는 성능 향상은 미미한 반면에 학습 시간만 늘어나는 것을 확인할 수 있다. 이는 전력 수요량에 대한 데이터가 일정 부분 주기적 패턴으로 반복되는 경향을 보이기 때문이다. 즉, 예측하고자 하는 시점이 월요일인 경우, 입력 변수로 지난주 월요일의 데이터를 입력하면 주기적 패턴을 반영할 수 있어서 성능이 향상될 수 있다. 본 출원의 실시 예에 따른 전력 수요 예측부(500)는 전력 수요량에 대한 이러한 주기적 패턴을 반영하기 위하여, 윈도우 사이즈를 '7'로 설정한다.In one embodiment of the present application, the power demand prediction unit 500 may set '7' as the window size. Referring to Figure 9, the performance improves as the window size increases from '1' to '7', but when the window size exceeds '7', the performance improvement is minimal, while only the learning time increases. there is. This is because data on electricity demand tends to repeat in a certain periodic pattern. In other words, if the point in time you want to predict is Monday, entering last Monday's data as an input variable can reflect periodic patterns and improve performance. The power demand prediction unit 500 according to an embodiment of the present application sets the window size to '7' to reflect this periodic pattern of power demand.

한편, 슬라이딩 윈도우 기법을 적용하는 경우, 예측하고자 하는 시점마다 모델을 새롭게 구성해야 하는 단점이 존재한다. 본 출원의 실시 예에 따른 전력 수요 예측부(500)는 이러한 단점을 보완하기 위하여 모델 구성 속도가 빠르면서도 예측 성능이 뛰어난 Light GBM 모델을 사용하여 구현될 수 있다. 이 경우, 전력 수요 예측부(500)는 데이터 중 기울기가 큰 부분만을 사용하여 정보를 얻는 GOSS(Gradient-based One Side Sampling) 기법과 상호 배타적 변수들을 묶어서 처리하는 EFB(Exclusive Featuree Bundling) 기법을 사용하여 모델의 구성 속도를 좀 더 빠르게 할 수 있다. On the other hand, when applying the sliding window technique, there is a disadvantage that a new model must be constructed each time a prediction is desired. In order to compensate for these shortcomings, the power demand prediction unit 500 according to an embodiment of the present application can be implemented using the Light GBM model, which has a fast model construction speed and excellent prediction performance. In this case, the power demand forecasting unit 500 uses the GOSS (Gradient-based One Side Sampling) technique, which obtains information using only the portion of the data with a large gradient, and the EFB (Exclusive Feature Bundling) technique, which bundles and processes mutually exclusive variables. This can speed up model construction.

도 10은 본 출원의 실시 예에 따른 전력 수요량 예측 장치(10)의 예측 결과를 다른 모델과 비교하는 그래프이다. Figure 10 is a graph comparing the prediction results of the power demand prediction device 10 according to an embodiment of the present application with other models.

도 10에 도시된 바와 같이, 서로 다른 클러스터 4개에 대한 전력 수요를 예측한 결과, 전력 수요량 예측 장치(10)의 MAPE(Mean Absolute Percentage Error)는 각각 4.545%, 3.755%, 2.7%, 2.144%를 기록하였으며, 다른 모델에 비하여 뛰어난 예측 성능을 갖는 것을 확인할 수 있다.As shown in Figure 10, as a result of predicting the power demand for four different clusters, the MAPE (Mean Absolute Percentage Error) of the power demand prediction device 10 is 4.545%, 3.755%, 2.7%, and 2.144%, respectively. was recorded, and it can be confirmed that it has excellent prediction performance compared to other models.

도 11은 도 1의 전력 수요량 예측 장치(10)의 동작을 보여주는 순서도이다.FIG. 11 is a flowchart showing the operation of the power demand prediction device 10 of FIG. 1.

S110 단계에서, 데이터 수집부(100)는 기상 데이터, 시간 데이터 및 전력 데이터를 수집할 수 있다. In step S110, the data collection unit 100 may collect weather data, time data, and power data.

S120 단계에서, 데이터 전처리부(200)는 수집된 기상 데이터, 시간 데이터 및 전력 데이터에 대한 전처리 동작을 수행할 수 있다. 예를 들어, 데이터 전처리부(200)는 기상 데이터 및 전력 데이터에 대한 정규화 동작을 수행할 수 있다. 예를 들어, 데이터 전처리부(200)는 1차원의 시간 데이터를 2차원의 시간 데이터로 변환할 수 있다. In step S120, the data preprocessing unit 200 may perform a preprocessing operation on the collected weather data, time data, and power data. For example, the data preprocessor 200 may perform a normalization operation on weather data and power data. For example, the data preprocessor 200 may convert one-dimensional time data into two-dimensional time data.

S130 단계에서, 이상치 탐지부(300)는 전처리된 전력 데이터 중 이상치를 탐지하는 동작을 수행할 수 있다. 예를 들어, 이상치 탐지부(300)는 전력 데이터를 이상 데이터와 정상 데이터로 분류할 수 있으며, 이상치 탐지부(300)는 변분 오토인코더(VAE)를 통하여 구현될 수 있다.In step S130, the outlier detection unit 300 may perform an operation to detect an outlier among the preprocessed power data. For example, the outlier detection unit 300 may classify power data into abnormal data and normal data, and the outlier detection unit 300 may be implemented through a variational autoencoder (VAE).

S140 단계에서, 이상치 복원부(400)는 이상 데이터를 복원할 수 있다. 예를 들어, 이상치 복원부(400)는 랜덤 포레스트 모델을 통하여 이상 데이터를 복원하여 복원 데이터를 생성할 수 있다. In step S140, the outlier restoration unit 400 may restore the abnormal data. For example, the outlier restoration unit 400 may restore abnormal data using a random forest model to generate restored data.

S150 단계에서, 전력 수요 예측부(500)는 복원 데이터를 포함하는 입력 데이터에 기초하여 예측 모델을 학습할 수 있다. 예를 들어, 전력 수요 예측부(500)는 슬라이딩 윈도우 기반 LightGBM을 통하여 구현될 수 있다. 학습된 예측 모델을 통하여, 전력 수요 예측부(500)는 미래 시점의 전력 수요량을 정확하게 예측할 수 있다. In step S150, the power demand prediction unit 500 may learn a prediction model based on input data including restored data. For example, the power demand prediction unit 500 may be implemented through sliding window-based LightGBM. Through the learned prediction model, the power demand prediction unit 500 can accurately predict power demand at a future point in time.

이상에서는 본 발명에 따른 바람직한 실시 예들에 대하여 도시하고 또한 설명하였다. 그러나 본 발명은 상술한 실시 예에 한정되지 아니하며, 특허 청구의 범위에서 첨부하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능할 것이다.In the above, preferred embodiments according to the present invention are shown and described. However, the present invention is not limited to the above-described embodiments, and various modifications and modifications may be made by anyone skilled in the art without departing from the gist of the present invention attached to the scope of the patent claims. .

100: 데이터 수집부
200: 데이터 전처리부
300: 이상치 탐지부
400: 이상치 복원부
500: 전력 수요 예측부100: Data collection unit
200: Data preprocessing unit
300: Outlier detection unit
400: Outlier restoration unit
500: Power demand prediction unit

Claims

A data collection unit that collects weather data, power data, and time data;
a data pre-processing unit that performs pre-processing operations on the weather data, the power data, and the time data;
An outlier detection unit implemented through a variational autoencoder that detects outliers using only reconstruction errors of the preprocessed power data, and classifies the preprocessed power data into abnormal data and normal data;
An outlier restoration unit implemented through a random forest model and restoring the abnormal data to generate restored data; and
It is implemented through a sliding window-based LightGBM model, and includes a power demand prediction unit that learns a prediction model and predicts power demand based on the normal data and the restored data,
The LightGBM model is constructed through a GOSS (gradient-based one side sampling) technique that obtains information based on the gradient of the normal data and the restored data and an EFB (exclusive feature bundling) technique that processes mutually exclusive variables by grouping them together. , power demand prediction device.

According to claim 1,
The data preprocessing unit
a weather data preprocessor that performs a normalization operation on the weather data;
A power data preprocessor that performs a normalization operation on the power data; and
A power demand prediction device comprising a time data preprocessor that converts the time data into two-dimensional time data.

According to clause 2,
The weather data preprocessor generates perceived temperature data and discomfort index data based on temperature, humidity, and wind speed data among the weather data, and performs a normalization operation on the generated perceived temperature data and discomfort index data. A power demand prediction device .

According to clause 2,
The time data preprocessor converts the time data into two different two-dimensional time data using a periodic function.

According to claim 1,
The weather data collected by the data collection unit is either weather forecast data or measured weather data.

According to claim 1,
The power demand prediction unit has a window size of 7.

In the data collection unit, collecting weather data, time data, and power data;
In a data preprocessing unit, performing a preprocessing operation on the weather data, the time data, and the power data;
In an outlier detection unit implemented through a variational autoencoder that detects outliers using only reconstruction errors of the preprocessed power data, classifying the preprocessed power data into abnormal data and normal data;
In an outlier restoration unit, restoring the abnormal data to generate restored data; and
In a power demand prediction unit implemented through a sliding window-based LightGBM model, learning a prediction model using the restored data as learning data, and predicting power demand based on the learned prediction model,
The LightGBM model is constructed through a GOSS (gradient-based one side sampling) technique that obtains information based on the gradient of the normal data and the restored data and an EFB (exclusive feature bundling) technique that processes mutually exclusive variables by grouping them together. , electricity demand prediction method.

According to clause 7,
The step of performing the preprocessing operation is
normalizing the weather data and the power data; and
A method for predicting power demand, comprising converting the time data into two different time data using a periodic function.

According to clause 8,
The step of normalizing the weather data is
Generating perceived temperature data and discomfort index data based on temperature, humidity, and wind speed data among the weather data; and
A power demand prediction method comprising performing a normalization operation on the generated perceived temperature data and the generated discomfort index data.

According to clause 9,
The weather data is either weather forecast data or measured weather data. A method of predicting power demand.

delete

According to clause 7,
A method for predicting power demand, wherein the outlier restoration unit is implemented through a random forest model.

delete

According to clause 7,
A method for predicting power demand, wherein the power demand prediction unit has a window size of 7.