KR20210092895A

KR20210092895A - Method and Server for Compensating a Missing Value in a Power Data

Info

Publication number: KR20210092895A
Application number: KR1020200006322A
Authority: KR
Inventors: 황인준; 정승원; 문지훈; 박성우
Original assignee: 한국전력공사; 고려대학교 산학협력단
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2021-07-27

Abstract

The present invention relates to a server for compensating missing values in power data to interpolate the missing values close to actual measurements and a method thereof. According to an embodiment of the present invention, the method comprises the following steps: collecting power measurement data and information on factors affecting the power measurement data; pre-processing the collected power measurement data and the information on the factors in a form for learning; using the pre-processed data to form a learning model which has a structure including sub-networks, receives the information on the factors, and outputs the power measurement data; performing learning for each sub-network forming the learning model by using the measured power measurement data; performing learning on the entire subnetwork included in the learning model; checking whether learning on the learning model is completed; and estimating missing power measurement data by using the learning model completing learning.

Description

{Method and Server for Compensating a Missing Value in a Power Data}

본 발명은 시간의 흐름에 따라 연속적으로 수집되는 전력 데이터에 대한 결측치를 보완하는 결측치 보완 서버 및 방법에 관한 것으로, 특히, 전력 소비에 영향을 줄수 있는 요인들을 감안하여 전력 소비에 대한 전력 데이터에 대한 결측치를 보완하는 결측치 보완 서버 및 방법에 관한 것이다.The present invention relates to a missing value complementation server and method for compensating for missing values for power data continuously collected over time. In particular, in consideration of factors that may affect power consumption, It relates to a missing value complementation server and method for compensating for missing values.

에너지 관리 시스템은 사용된 전기 에너지 소비량을 측정하는 장치인 스마트미터를 포함한 다양한 센서를 갖추고 있으며, 이는 주택, 건물, 도시 등 관리대상 정보를 수집하여 에너지 관리 시스템의 효율적인 운용을 위한 정보로 사용된다.The energy management system is equipped with various sensors, including a smart meter, which is a device that measures the amount of used electrical energy consumption, which collects information on management targets such as houses, buildings, and cities, and is used as information for efficient operation of the energy management system.

스마트미터의 측정값은 현재 전력 소비에 대한 지표뿐만 아니라 미래 전력 소비 예측에 대한 지표로도 사용된다. The smart meter's measurements are used not only as an indicator for current power consumption, but also as an indicator for forecasting future power consumption.

에너지 관리 시스템은 예측 결과에 기초하여 에너지 관리 계획을 수립하기 때문에 정확한 전력 소비 예측은 에너지 관리 시스템의 운영에 중요한 요소이다. 또한 전력 소비의 예측 정확도는 데이터의 품질에 영향을 크게 받기 때문에 구성이 잘 된 스마트미터 데이터 수집이 필수적이다.Since the energy management system establishes an energy management plan based on the prediction results, accurate power consumption prediction is an important factor in the operation of the energy management system. In addition, since the prediction accuracy of power consumption is greatly affected by the data quality, well-organized smart meter data collection is essential.

수백 미터 이상의 원격지의 시설이나 장비의 여러 개의 센서로부터 신호를 호스트에 보내고 또한, 호스트로부터 원격지 장비의 전원 On/Off 등의 제어신호를 보내는 구조의 쌍방향 무선통신을 위해서 로라(LoRa)라는 국제 규격의 무선 제어 기술을 적용하기도 한다. 이외에도 스마트미터의 측정 데이터 전송을 위한 특히, IoT 전용의 통신방식은 로라(LoRa)외에 SigFox, LTE-M, NB-IoT 등의 다양한 방식이 있으며 주요 통신 서비스 업체들이 각각 다른 방식들을 채용하여 경쟁하고 있는 상태이다.For two-way wireless communication with a structure that sends signals from several sensors of remote facilities or equipment more than several hundred meters to the host, and also sends control signals such as power on/off of remote equipment from the host, the international standard called LoRa is used. Wireless control technology is also applied. In addition, there are various methods such as SigFox, LTE-M, and NB-IoT in addition to LoRa, especially for IoT-dedicated communication methods for transmitting measurement data of smart meters, and major communication service companies compete by adopting different methods. is in a state

스마트미터로 전력을 측정하는 경우 종종 장치의 오작동 및 신호 전송의 오류 등의 이유로 전력 측정에 결측이 발생되어, 품질 높은 전력 데이터의 수집이 쉽지 않다.When measuring power with a smart meter, power measurement is often missing due to device malfunction or signal transmission error, making it difficult to collect high-quality power data.

전력 측정에 결측이 생기는 경우 정보의 손실로 인한 예측 오류를 유발하며, 특히 자동 회귀 적산 이동 평균(ARIMA)과 같은 연속 값을 기반으로 하는 예측 방법의 성능이 떨어지는 현상이 발생된다.When a power measurement is missing, it causes prediction errors due to loss of information, and in particular, the performance of prediction methods based on continuous values such as autoregressive integrated moving average (ARIMA) deteriorates.

결측 문제는 결측된 부분을 적절한 값으로 채워 보완하는 방식으로 문제를 해결하며, 다양한 방식의 보완 방법이 연구되고 있다. The missing part problem is solved by filling in the missing part with appropriate values and supplementing the problem, and various complementary methods are being studied.

선형 보간법은 시간적으로 인접한 데이터를 사용해 보간하는 방법으로 결측값의 전 후 데이터를 기준으로 직선을 연결하고, 이 직선상의 값을 결측치에 대입 하는 방식을 사용한다. 하지만 결측 발생 간격의 길이가 길면 성능이 매우 떨어진다는 단점이 있다.Linear interpolation is a method of interpolating using temporally adjacent data, connecting a straight line based on the data before and after the missing value, and substituting the value on this straight line to the missing value. However, there is a disadvantage that the performance is very poor if the length of the missing interval is long.

k-최근접 이웃 알고리즘을 사용하여 가장 가까운 데이터를 사용하는 보간 방법이 있다. 이 방법은 결측값 근처의 데이터 포인트에 최근접 이웃 알고리즘을을 사용하여 현존하는 데이터에서 유사한 패턴을 찾는다. 그런 다음, 찾은 패턴을 기반으로 결측된 부분의 데이터를 보간한다. 그러나 보간 결과는 과거 데이터에 너무 의존하므로 데이터 패턴이 복잡한 경우 보간 성능이 떨어지게된다.There is an interpolation method that uses the nearest data using the k-nearest neighbor algorithm. This method finds similar patterns in existing data by using the nearest neighbor algorithm on data points near the missing values. Then, the data of the missing part is interpolated based on the found pattern. However, since the interpolation result depends too much on the past data, the interpolation performance deteriorates when the data pattern is complex.

대한민국 공개특허 10-2018-0112540호Republic of Korea Patent Publication No. 10-2018-0112540

본 발명은 전력 데이터의 결측치를 실 측정치에 가깝게 보간할 수 있는 전력 데이터의 결측치 보완 서버 및 방법을 제공하고자 한다.An object of the present invention is to provide a server and method for compensating missing values of power data that can interpolate missing values of power data close to actual measured values.

구체적으로, 본 발명은 복잡한 전력 수요 패턴을 파악해 보다 정확한 결측치 보완을 수행할 수 있는 전력 데이터의 결측치 보완 서버 및 방법을 제공하고자 한다.Specifically, an object of the present invention is to provide a missing value complementation server and method for power data capable of performing more accurate missing value correction by identifying a complex power demand pattern.

구체적으로, 본 발명은 시계열 패턴을 잘 반영할 수 있는 변수를 구성하고 기계학습 알고리즘을 앙상블하여 보다 최적화된 보간 모델을 구성할 수 있는 전력 데이터의 결측치 보완 서버 및 방법을 제공하고자 한다.Specifically, the present invention intends to provide a server and method for compensating for missing values of power data that can configure a variable that can reflect time series patterns well and configure a more optimized interpolation model by ensembles a machine learning algorithm.

본 발명의 일 측면에 따른 전력 측정 데이터 결측치 보완 방법은, 전력 측정 데이터 및 상기 전력 측정 데이터에 영향을 주는 요인들에 대한 정보를 수집하는 단계; 수집된 상기 전력 측정 데이터 및 상기 요인들에 대한 정보를 학습을 위한 형태로 전처리하는 단계; 상기 전처리된 데이터를 이용하여, 서브 네트워크들을 포함하는 구조를 가지며, 상기 요인들에 대한 정보가 입력으로 되고 상기 전력 측정 데이터가 출력으로 되는 학습 모델을 형성하는 단계; 실측된 상기 전력 측정 데이터를 이용하여, 상기 학습 모델을 형성하는 각 서브 네트워크에 대한 학습을 수행하는 단계; 상기 학습 모델에 포함된 서브 네트워크 전체에 대한 학습을 수행하는 단계; 상기 학습 모델에 대한 학습의 완료 여부를 확인하는 단계; 및 학습이 완료된 상기 학습 모델을 이용하여 결손된 전력 측정 데이터를 추정하는 단계를 포함할 수 있다.In accordance with one aspect of the present invention, there is provided a method for compensating for missing values of power measurement data, the method comprising: collecting information on power measurement data and factors affecting the power measurement data; preprocessing the collected power measurement data and information on the factors into a form for learning; using the preprocessed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output; performing learning for each sub-network forming the learning model by using the measured power measurement data; performing learning on the entire subnetwork included in the learning model; checking whether learning on the learning model is completed; and estimating the missing power measurement data using the learning model on which learning is completed.

여기서, 상기 전력 측정 데이터는, 전력 소비에 대한 계측값이며, 상기 요인들은, 기상 데이터 및 달력 정보일 수 있다.Here, the power measurement data is a measured value for power consumption, and the factors may be weather data and calendar information.

여기서, 상기 학습 모델은 앙상블 모델이며, 상기 측정 데이터를 학습을 위한 형태로 전처리하는 단계에서는, 상기 각 요인에 대한 입력 항목을 0에서 1사이의 절대값을 가지는 실수값, 양의 실수값 또는 벡터값으로 변환하는 방식으로, 상기 앙상블 학습 모델의 입력 데이터들을 최대 최소 정규화에 의해 정규화를 수행할 수 있다.Here, the learning model is an ensemble model, and in the step of pre-processing the measurement data in a form for learning, the input items for each factor are a real value having an absolute value between 0 and 1, a positive real value, or a vector. As a method of converting the values into values, normalization may be performed on the input data of the ensemble learning model by maximal/minimum normalization.

여기서, 상기 학습 모델을 형성하는 단계에서는, 학습 모델에 포함될 서브 네트워크의 개수를 결정하고, 상기 전처리된 데이터에 대한 무작위 샘플링후, 앙상블 가중치를 가지는 서브 네트워크들을 형성할 수 있다.Here, in the step of forming the learning model, the number of sub-networks to be included in the learning model may be determined, and after random sampling of the preprocessed data, sub-networks having ensemble weights may be formed.

여기서, 상기 각 서브 네트워크에 대한 학습을 수행하는 단계에서는, 각 서브 네트워크에 대한 실제 전력 소비 데이터와 예상 에너지 소비 데이터의 차이에 기반한 손실 함수를 최소화하는 방향으로 정해진 회수 만큼 학습을 수행할 수 있다.Here, in the step of performing learning for each sub-network, learning may be performed a predetermined number of times in a direction to minimize a loss function based on a difference between actual power consumption data and expected energy consumption data for each sub-network.

여기서, 상기 손실 함수는 하기 수학식에 따라 정의될 수 있다.Here, the loss function may be defined according to the following equation.

(여기서, t는 실제 전기 에너지 소비 데이터, y_i는 서브 네트워크(SN_i)의 예상 에너지 소비)(where t is the actual electrical energy consumption data, y _i is the expected energy consumption of the subnetwork (SN _{i ))}

여기서, 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계에서는, 하위 네트워크의 앙상블 가중치가 사전 정의된 임계값(α_th)보다 작은 지 여부를 검사하고, 임계값 보다 작다면, 해당 서브 네트워크를 상기 학습 모델에서 삭제할 수 있다.Here, in the step of learning the entire sub-network _{, it is checked whether the ensemble weight of the sub-network is smaller than a predefined threshold value (α th} ), and if it is smaller than the threshold value, the corresponding sub-network is trained It can be deleted from the model.

여기서, 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계에서는, 상기 서브 네트워크 전체에 대한 하기 수학식에 따른 손실 함수를 최소화하는 방향으로 학습을 수행할 수 있다.Here, in the step of performing the learning for the entire sub-network, learning may be performed in a direction of minimizing a loss function according to the following equation for the entire sub-network.

(여기서, t는 실제 전기 에너지 소비 데이터, y_out은 서브 네트워크 전체의 예상 에너지 소비)(where t is the actual electrical energy consumption data, y _out is the estimated energy consumption of the entire subnetwork)

여기서, 상기 학습의 완료 여부를 확인하는 단계에서는, 이전에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과와, 금번에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과를 비교하여, 상기 2 결과의 차이가 소정의 기준치 미만이면 학습이 완료된 것으로 판단할 수 있다.Here, in the step of checking whether the learning is completed, the result of performing the previously performed learning on each of the sub-networks and the learning of the entire sub-network, and By comparing the results of performing the learning for each sub-network and the learning for the entire sub-network, if the difference between the two results is less than a predetermined reference value, it can be determined that the learning has been completed. .

본 발명의 다른 측면에 따른 전력 측정 데이터 결측치 보완 서버는, 전력 측정 데이터 및 상기 전력 측정 데이터에 영향을 주는 요인들에 대한 정보를 수집하는 데이터 수집부; 상기 수집된 데이터 및 정보를 최대 최소 정규화에 의해 정규화를 하는 데이터 전처리부; 상기 전처리된 데이터를 이용하여, 서브 네트워크들을 포함하는 구조를 가지며, 상기 요인들에 대한 정보가 입력으로 되고 상기 전력 측정 데이터가 출력으로 되는 학습 모델을 형성하는 학습 모델 형성부; 실측된 상기 전력 측정 데이터를 이용하여 형성된 상기 학습 모델을 학습시키는 학습 수행부; 및 학습이 완료된 상기 학습 모델을 이용하여 결손된 전력 측정 데이터를 추정하는 데이터 보간부를 포함할 수 있다.Power measurement data missing value compensation server according to another aspect of the present invention, the power measurement data and data collection unit for collecting information on factors affecting the power measurement data; a data preprocessing unit that normalizes the collected data and information by maximum and minimum normalization; a learning model forming unit that uses the preprocessed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output; a learning performing unit for learning the learning model formed using the measured power measurement data; and a data interpolator for estimating missing power measurement data using the learning model on which learning is completed.

여기서, 상기 전력 측정 데이터는, 전력 소비에 대한 계측값이며, 상기 요인들은, 기온을 포함하는 기상 정보 및 월, 일, 요일을 포함하는 달력 정보일 수 있다.Here, the power measurement data is a measured value for power consumption, and the factors may be weather information including temperature and calendar information including month, day, and day of the week.

여기서, 상기 데이터 전처리부는, 상기 기상 정보는 최대 최소 정규화를 통해 0과 1사이의 수치 데이터로 변환하고, 상기 월, 일 정보는 0에서 1사이의 절대값을 가지는 벡터값으로 변환하고, 상기 요일 정보는 0과 1의 정수값으로 변환할 수 있다.Here, the data preprocessor converts the weather information into numerical data between 0 and 1 through maximum and minimum normalization, and converts the month and day information into a vector value having an absolute value between 0 and 1, and the day of the week Information can be converted to integer values of 0 and 1.

여기서, 상기 학습 수행부는, 실측된 상기 전력 측정 데이터를 이용하여, 상기 학습 모델을 형성하는 각 서브 네트워크에 대한 학습을 수행하고, 상기 학습 모델 상에서 상기 각 서브 네트워크의 가중치에 대한 학습을 수행하고, 상기 학습 모델에 대한 학습의 완료 여부를 판단할 수 있다.Here, the learning performing unit performs learning for each sub-network forming the learning model by using the measured power measurement data, and performs learning on the weight of each sub-network on the learning model, It may be determined whether learning of the learning model is completed.

여기서, 상기 학습 수행부는, 이전에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과와, 금번에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과를 비교하여, 상기 2 결과의 차이가 소정의 기준치 미만이면 학습이 완료된 것으로 판단할 수 있다.Here, the learning performing unit includes a result of performing the previously performed learning for each of the sub-networks and the learning for the entire sub-network, and a result of the previously performed learning for each of the sub-networks. Comparing the results of performing the learning and the learning of the entire sub-network, if the difference between the two results is less than a predetermined reference value, it may be determined that the learning is completed.

상술한 구성에 따른 본 발명의 전력 데이터의 결측치 보완 서버 및 방법을 실시하면, 전력 데이터의 결측치를 실 측정치에 가깝게 보간할 수 있는 이점이 있다.When the server and method for supplementing the missing value of power data according to the present invention according to the above-described configuration are implemented, there is an advantage in that the missing value of the power data can be interpolated close to the actual measured value.

본 발명의 전력 데이터의 결측치 보완 서버 및 방법은, 데이터 전처리를 통해 전력 수요 패턴 학습에 효과적인 입력 변수를 구성할 수 있다는 이점이 있다.The server and method for supplementing missing values of power data of the present invention have the advantage that an effective input variable can be configured for power demand pattern learning through data preprocessing.

본 발명의 전력 데이터의 결측치 보완 서버 및 방법은, 임의의 기간으로 데이터를 뽑아 다층 퍼셉트론 모델을 학습시키고, 학습된 다층 퍼셉트론 중 우수한 모델을 사용하여 해당 전력 데이터에 최적화된 예측 모델을 구성하여, 이를 통해 결측된 기간의 전력량을 정확하게 예측할 수 있는 이점이 있다. The server and method for supplementing missing values of power data of the present invention train a multi-layer perceptron model by extracting data in an arbitrary period, and use an excellent model among the learned multi-layer perceptron to construct a predictive model optimized for the corresponding power data, It has the advantage of accurately predicting the amount of power in the missing period.

도 1은 본 발명의 사상에 따른 전력 데이터의 결측치 보완 시스템의 프로세스 구성을 나타낸 알고리즘 개념도.
도 2는 앙상블 모델의 어셈블 구조를 표현한 개념도.
도 3은 중간 점검을 위한 EMAk 계산에 필요한 흐름을 나타낸 개념도.
도 4는 본 발명의 사상에 따른 전력 측정 데이터 결측치 보완 서버를 도시한 블록도.1 is a conceptual diagram of an algorithm showing a process configuration of a system for compensating missing values of power data according to the spirit of the present invention.
2 is a conceptual diagram representing an assembly structure of an ensemble model;
3 is a conceptual diagram illustrating a flow required for calculating EMAk for an intermediate check.
4 is a block diagram illustrating a power measurement data missing value complementation server according to the spirit of the present invention.

본 발명을 설명함에 있어서 제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되지 않을 수 있다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. In describing the present invention, terms such as first, second, etc. may be used to describe various components, but the components may not be limited by the terms. The terms are only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 연결되어 있다거나 접속되어 있다고 언급되는 경우는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해될 수 있다.When a component is referred to as being connected or connected to another component, it may be directly connected or connected to the other component, but it can be understood that other components may exist in between. .

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. The terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. The singular expression may include the plural expression unless the context clearly dictates otherwise.

본 명세서에서, 포함하다 또는 구비하다 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로서, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해될 수 있다. In this specification, the terms include or include are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, and includes one or more other features or numbers, It may be understood that the existence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

또한, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.In addition, shapes and sizes of elements in the drawings may be exaggerated for clearer description.

도 1은 본 발명의 사상에 따른 전력 데이터의 결측치 보완 시스템의 프로세스 구성을 나타낸 개념도이다.1 is a conceptual diagram illustrating a process configuration of a system for compensating missing values of power data according to the spirit of the present invention.

도시한 프로세스에 따른 결측치 보완 방법은, 데이터 세트 준비, 모델 준비, 향상 단계, 조정 단계 및 중간 검사를 포함하여 5 단계로 구성될 수 있다. 각 단계 별 상세한 예시는 후술하겠다.The missing value correction method according to the illustrated process may consist of five steps, including data set preparation, model preparation, enhancement stage, adjustment stage, and interim inspection. Detailed examples for each step will be described later.

구체적으로 기술하면, 도시한 전력 측정 데이터 결측치 보완 방법은, 전력 측정 데이터 및 상기 전력 측정 데이터에 영향을 주는 요인들에 대한 정보를 수집하는 단계(S10); 수집된 상기 전력 측정 데이터 및 상기 요인들에 대한 정보를 학습을 위한 형태로 전처리하는 단계(S15)(여기서, 앙상블 모델을 학습시키기 위한 데이터 세트를 구성할 수 있다.); 상기 전처리된 데이터를 이용하여, 서브 네트워크들을 포함하는 구조를 가지며, 상기 요인들에 대한 정보가 입력으로 되고 상기 전력 측정 데이터가 출력으로 되는 학습 모델을 형성하는 단계(S20); 실측된 상기 전력 측정 데이터를 이용하여, 상기 학습 모델을 형성하는 각 서브 네트워크에 대한 학습을 수행하는 단계(S30); 상기 학습 모델에 포함된 서브 네트워크 전체에 대한 학습을 수행하는 단계(S40); 상기 학습 모델에 대한 학습의 완료 여부를 확인하는 단계(S50); 및 학습이 완료된 상기 학습 모델을 이용하여 결손된 전력 측정 데이터를 추정하는 단계(S60)를 포함할 수 있다.Specifically, the illustrated power measurement data missing value supplementation method includes the steps of: collecting power measurement data and information on factors affecting the power measurement data (S10); pre-processing the collected power measurement data and information on the factors into a form for learning (S15) (here, it is possible to configure a data set for learning the ensemble model); using the pre-processed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output (S20); performing learning for each sub-network forming the learning model by using the measured power measurement data (S30); performing learning on the entire subnetwork included in the learning model (S40); checking whether learning on the learning model is completed (S50); and estimating the missing power measurement data using the learning model on which learning is completed (S60).

상기 데이터/정보를 수집하는 단계(S10)는, 도 1에 도시한 바와 같이 다수의 분산 설치된 전력 측정기들로부터 전력 관련 계측값들을 기록하고, 전력 소비에 영향을 주는 요인들로서 기상 정보들을 수집하여 기록하는 중앙 수집 장치를 구비하는 시스템에서, 상기 중앙 수집 장치에 기록된 계측값을 읽어들이는 방식으로 수행될 수 있다. 다른 구현에서의 상기 S10 단계에서는 각 측정기나 기상 측정 장치로부터 직접 계측값을 전송받을 수 있다.In the step of collecting the data/information (S10), as shown in FIG. 1, power-related measurement values are recorded from a plurality of distributed power measuring devices, and weather information is collected and recorded as factors affecting power consumption. In a system having a central collection device, it may be performed in a manner of reading the measured values recorded in the central collection device. In another implementation, in the step S10, the measured value may be directly transmitted from each measuring device or a weather measuring device.

다음, 상기 학습을 위한 형태로 전처리하는 단계(S15)에서 수행되는 학습을 위한 데이터 세트의 준비 과정을 설명한다.Next, a process of preparing a data set for learning performed in the step S15 of pre-processing in the form for learning will be described.

상기 S10 단계 및 S15 단계의 수행에 있어서, 상기 전력 측정 데이터는, 전력 소비에 대한 계측값이며, 상기 요인들은, 기상 데이터 및 달력 정보인 경우로 구체화하여 설명한다.In performing the steps S10 and S15, the power measurement data is a measured value for power consumption, and the factors are concretely described in the case of weather data and calendar information.

본 발명의 사상에 따라 제안하는 방식은 여러 MLP가 있는 앙상블 모델을 기반으로 할 수 있다. 따라서 첫번째 단계는 데이터를 수집하고 앙상블 모델을 학습시키기 위한 데이터 세트를 구성하는 과정이다. 예컨대, 전기 에너지 소비 데이터를 수집하는 경우, 수집되는 데이터는 앙상블 모델의 출력 변수가 된다. 앙상블 모델의 입력 변수 구성을 위해 날씨 및 달력과 같은 설명 데이터를 수집한다. 이러한 데이터는 SM(센서 모듈)의 측정 기간마다 측정 또는 처리되어야 한다. 예를들어 SM이 1시간마다 에너지 소비를 기록하는 경우 설명 데이터에는 1 시간마다 정보가 포함되어야 한다. 날씨 데이터에는 온도, 습도, 풍속, 온도 습도 지수(THI) 및 바람 온도 지수(WCI)가 포함될 수 있다. 이러한 요소는 에너지 소비를 예측하는 데 실제로 사용되고 있으므로 입력 변수로 선택될 수 있다. 달력 데이터에는 타임 스탬프, 관리 대상 작업 일정 및 휴일 날짜와 같은 측정 시간에 대한 정보가 포함될 수 있다.The method proposed according to the spirit of the present invention may be based on an ensemble model with several MLPs. Therefore, the first step is the process of collecting data and constructing a data set for training the ensemble model. For example, when collecting electrical energy consumption data, the collected data becomes an output variable of the ensemble model. We collect descriptive data such as weather and calendar to construct input variables of the ensemble model. These data must be measured or processed for each measurement period of the SM (sensor module). For example, if the SM records energy consumption every hour, the descriptive data should contain information every hour . Weather data may include temperature, humidity, wind speed, temperature humidity index (THI) and wind temperature index (WCI). Since these factors are actually used to predict energy consumption, they can be selected as input variables. Calendar data may include information about times of measurement, such as timestamps, managed work schedules, and holiday dates.

달력 데이터에는 일부 전처리가 필요하다. 먼저, 타임 스탬프를 통해 계절, 월, 일, 시, 분, 요일에 대한 값을 찾는다. 구현에 따라, 측정 간격이 1 시간 이상인 경우 분을 고려하지 않을 수 있다. 다음, 월, 일 및 시간과 같은 시간 단위는 주기성을 반영하기 위해 전처리될 수 있다. 여기에 사용된 방정식은 하기 수학식 1과 하기 수학식 2이다. Calendar data needs some preprocessing. First, find values for season, month, day, hour, minute, and day of the week through timestamps. Depending on the implementation, minutes may not be considered if the measurement interval is 1 hour or more. Next, time units such as month, day, and hour can be preprocessed to reflect periodicity. The equations used here are Equation 1 and Equation 2 below.

여기서, time은 변환하려는 시간 단위이고, period_time은 주기이다. 예를 들어, 월을 변환하고자 하는 경우, period_time은 12가 된다. 일(day)을 변환하고자 하는 경우, 월에 따라 period_time은 28에서 31까지 다양할 수 있다. Here, time is a time unit to be converted, and period _time is a period. For example, if you want to convert months, period _time becomes 12. If you want to convert days, the period _time may vary from 28 to 31 depending on the month.

다음, 계절과 요일은 1개 원소만 1을 가지는 벡터(one-hot vector)로 표시된다. Next, the season and the day of the week are expressed as a one-hot vector with only 1 element.

다음, 측정 시간이 관리 대상의 작업 시간에 포함되는지 여부에 대한 표시를 추가한다. Next, add an indication as to whether the measured time is included in the managed subject's working time.

마지막으로, 측정 시간이 휴일인지 여부를 나타내는 변수를 추가한다. 달력 데이터 전처리 후 26 개의 입력 변수가 준비되고, 이러한 변수 목록이 하기 표 1과 같이 구성될 수 있다.Finally, we add a variable indicating whether the measurement time is a holiday. After preprocessing the calendar data, 26 input variables are prepared, and a list of these variables may be configured as shown in Table 1 below.

날씨 관련 변수는 1 보다 클 수 있다. 다른 변수와 유사한 범위에 맞추기 위해 최소-최대 정규화를 수행하여 이러한 변수의 범위를 0에서 1로 만들 수 있다. 또한 수집된 전기 에너지 소비 데이터는 동일한 이유로 인해 정규화될 수 있다. 결과적으로, 각 측정 시간마다 1 개의 출력 변수와 26개의 입력 변수가 생성되고 전체 데이터 세트가 준비된다. 데이터 세트의 모든 데이터 포인트 중, 이 데이터 포인트는 출력 변수의 값을 알면 학습 데이터 세트에 포함되고, 그런 다음 앙상블 모델은 이러한 데이터 포인트를 통해 학습한다.The weather-related variable may be greater than 1. You can make these variables range from 0 to 1 by performing min-max normalization to fit within similar ranges as other variables. The collected electrical energy consumption data can also be normalized for the same reason. As a result, one output variable and 26 input variables are generated for each measurement time and the entire data set is prepared. Of all data points in the data set, this data point is included in the training data set if the values of the output variables are known, and then the ensemble model learns from these data points.

상술한 구체적인 예시에서, 상기 학습 모델은 앙상블 모델이며, 상기 측정 데이터를 학습을 위한 형태로 전처리하는 단계(S15)에서는, 상기 각 요인에 대한 입력 항목을 0에서 1사이의 절대값을 가지는 실수값, 양의 실수값 또는 벡터값으로 변환하는 방식으로, 상기 앙상블 학습 모델의 입력 데이터들을 최대 최소 정규화에 의해 정규화를 수행하는 것을 알 수 있다.In the specific example described above, the learning model is an ensemble model, and in the step (S15) of pre-processing the measurement data into a form for learning, the input items for each factor are real values having an absolute value between 0 and 1. , it can be seen that normalization is performed by maximal/minimum normalization of the input data of the ensemble learning model in a manner of converting it into a positive real value or a vector value.

다음, 상기 학습 모델을 형성하는 단계(S20)에서 수행되는 학습 모델 준비 과정을 설명한다.Next, a learning model preparation process performed in the step S20 of forming the learning model will be described.

앙상블 모델 구성의 경우, N으로 표시되는 앙상블 모델을 구성하는 서브 네트워크 수를 결정한다. 모델에 서브 네트워크가 많을수록 학습이 끝날 때 더 나은 모델을 얻을 가능성이 높다. 그러나, 학습시간은 N에 비례하여 길어진다. 실험 등에 의해 경험적으로, 제안된 모델은 N이 20 이상으로 설정될 때 충분한 결과를 달성할 수 있음을 알 수 있다.In the case of ensemble model configuration, the number of subnetworks constituting the ensemble model represented by N is determined. The more subnetworks the model has, the more likely it is to get a better model at the end of training. However, the learning time becomes longer in proportion to N. Experimentally, it can be seen that the proposed model can achieve sufficient results when N is set to 20 or more.

N을 설정한 후, 각 서브 네트워크에 대해 새로운 데이터 세트를 구성하기 위해, 일반적으로는 랜덤 샘플링을 수행한다. 샘플링은 서로 다른 N 개의 새로운 데이터 세트를 생성하며 각 서브 네트워크는 해당 데이터 세트를 소유한다. 서브 네트워크는 데이터 세트에 의해 학습되며 학습 후 다른 서브 네트워크에서 다른 예측 결과를 가져와 앙상블의 효과를 강화한다. 이러한 방식을 일반적으로 부트 스트랩 집계 또는 배깅이라고 칭한다. 배깅의 유명한 예는 랜덤 포레스트(RF)이다. 랜덤 포레스트(RF)는 다양한 트리를 성장시키기 위해 배깅(bagging)을 사용하는 트리 기반 앙상블 모델이다. 뛰어난 일반화 성능과 정확한 예측 성능이 여러 연구에서 보고되었으며, 이는 모델에 대한 배깅을 채택하도록 이점을 부여한다.After setting N, random sampling is generally performed to construct a new data set for each subnetwork. Sampling generates N different sets of new data, each subnetwork owns its own data set. Subnetworks are trained by the data set, and after learning, different prediction results are obtained from different subnetworks to enhance the effectiveness of the ensemble. This method is commonly referred to as bootstrap aggregation or bagging . A famous example of bagging is random forest (RF). Random Forest (RF) is a tree-based ensemble model that uses bagging to grow various trees. Excellent generalization performance and accurate prediction performance have been reported in several studies, which gives an advantage to adopt bagging for the model.

상기 학습 모델을 형성하는 단계(S20)에서는, 학습 모델에 포함될 서브 네트워크의 개수를 결정하고, 상기 전처리된 데이터에 대한 무작위 샘플링후, 앙상블 가중치를 가지는 서브 네트워크들을 형성한다. In the step of forming the learning model ( S20 ), the number of sub-networks to be included in the learning model is determined, and after random sampling of the pre-processed data, sub-networks having ensemble weights are formed.

도 2는 앙상블 모델의 어셈블 구조를 표현한다. 2 shows the assembly structure of the ensemble model.

N_SN 서브 네트워크는 무작위 샘플링 후 생성된다. MLP의 가중치 및 바이어스를 제외하고, 각각의 서브 네트워크(SN_i, i = 1, 2, 3,…N)는 추가로 앙상블 가중치 a_i를 갖는다. a_i는 앙상블 모델의 결과를 계산하는데 사용되며, 이는 각 서브 네트워크(y_i)의 출력에 대한 가중 평균이다. 그러나 가중 평균을 보장하기 위해 앙상블 가중치는 다음 두 가지 조건을 충족해야 한다. 1) 모든 앙상블 가중치는 [0, 1] (a_i ∈[0, 1]) 간격에 포함되어야 한다. 2) 앙상블 가중치의 합은 1이어야 한다(Σa_i = 1). N _SN subnetworks are created after random sampling. Except for the weight and bias of the MLP, each subnetwork (SN _i , i = 1, 2, 3,…N) additionally has an ensemble weight a _i . a _i is used to compute the results of the ensemble model, which is the weighted average of the outputs of _{each subnetwork (y i ).} However, to guarantee a weighted average, the ensemble weights must satisfy the following two conditions: 1) All ensemble weights must be included in the interval [0, 1] (a _i ∈[0, 1]). 2) The sum of the ensemble weights must be 1 (Σa _i = 1).

딥 러닝 모델을 설계 할 때이 두 가지 조건을 충족하면서 모델을 학습시키는 방법을 찾기가 어렵지만, softmax의 기능은 항상 분류 작업에 사용되는 이러한 조건을 충족시킬 수 있다. 따라서, softmax 기능을 적용하여, 새로운 변수 w_i를 채택하여 a_i를 결정할 수 있다. softmax 함수는 하기 수학식 3으로 정의될 수 있다. When designing a deep learning model, it is difficult to find a way to train the model while meeting these two conditions, but the function of softmax can always satisfy these conditions used for classification tasks. Therefore, by applying the softmax function, _{a i} can be determined by adopting a _{new variable w i .} The softmax function may be defined by Equation 3 below.

상술한 softmax 기능의 속성으로 인해 앙상블 가중치는 항상 두 가지 조건을 충족시킨다. 따라서 상기 모델의 학습에서는 a_i를 직접 조정하지 않고 a_i 대신 w_i를 조정한다. 상기 w_i의 변화는 a_i에 영향을 미쳐 a_i의 적절한 값을 찾을 수 있게 한다. 나머지 설명에서는 편의를 위해 a_i를 직접 조정한 것처럼 대략적으로 표현하겠다.Due to the properties of the softmax function described above, the ensemble weight always satisfies two conditions. Therefore, in the training of the model, a _i is not directly adjusted, but _{w i} is adjusted instead of _{a i .} Changes in the w _i will be able to find the appropriate value of the influence on a _i a _i. In the rest of the description, for convenience, a _i will be expressed roughly as if it were directly adjusted.

본 모델에는 1) 서브 네트워크와 2) 앙상블 가중치라는 두 가지 부분이 최적화되어야 한다. 초기에 서브 네트워크의 매개 변수 및 앙상블 가중치를 포함한 모든 매개 변수가 무작위로 할당되지만, 상기 두가지 부분은 다음 2 단계 교육이 진행됨에 따라 최적화된다.In this model, two parts, 1) sub-network and 2) ensemble weights, need to be optimized. Initially, all parameters, including the parameters of the subnetwork and ensemble weights, are randomly assigned, but the two parts are optimized as the next two-step training proceeds.

다음, 상기 형성된 학습 모델의 향상 과정, 즉, 상기 각 서브 네트워크에 대한 학습을 수행하는 단계(S30)에 대하여 설명한다. Next, a process of improving the formed learning model, that is, performing learning on each of the sub-networks (S30) will be described.

상기 각 서브 네트워크에 대한 학습을 수행하는 단계(S30)에서는, 각 서브 네트워크에 대한 실제 전력 소비 데이터와 예상 에너지 소비 데이터의 차이에 기반한 손실 함수를 최소하는 방향으로 정해진 회수 만큼 학습을 수행한다. In the step (S30) of performing learning for each sub-network, learning is performed a predetermined number of times in the direction of minimizing a loss function based on a difference between actual power consumption data and expected energy consumption data for each sub-network.

구체적으로, 상기 S30 단계의 학습(첫번째 학습)에서 각 하위 네트워크는 독립적으로 학습된다. SN_i의 손실 함수(loss_ES,i)는 하기 수학식 4에 의해 정의될 수 있다.Specifically, in the learning (first learning) of step S30, each sub-network is independently learned. The loss function (loss _ES,i ) of SN _i may be defined by Equation 4 below.

여기서, t는 실제 전기 에너지 소비 데이터이고, y_i는 SN_i의 예상 에너지 소비이다. 서로 방해하지 않고 모든 서브 네트워크는 손실 기능을 최소화하기 위해 자체 데이터 세트를 학습한다. 따라서, 상기 단계에서는 MLP의 파라미터만 변경되는 반면 앙상블 가중치는 고정된다. 모든 서브 네트워크가 Epoch_ES 회수만큼 학습하면, 본 단계가 완료되고 2번째 학습인 조정 단계가 시작된다.Here, t is the actual electrical energy consumption data, and y _i is the expected energy consumption of _{SN i .} Without interfering with each other, all subnetworks learn their own data sets to minimize lossy functions. Therefore, in the above step, only the parameters of the MLP are changed while the ensemble weight is fixed. When all the subnetworks have _{learned the number of epoch ESs} , this stage is completed and the second learning, the adjustment stage, begins.

다음, 상기 서브 네트워크 전체에 대한 조정 과정, 즉, 상기 학습 모델에 포함된 서브 네트워크 전체에 대한 학습을 수행하는 단계(S40)에 대하여 설명한다.Next, an adjustment process for the entire sub-network, that is, a step (S40) of learning the entire sub-network included in the learning model will be described.

본 과정에서는 앙상블 가중치가 조정되고, 이러한 가중치에 따라 불필요한 서브 네트워크가 제거된다. 앙상블 가중치 조정에서는 향상 단계와 달리, 학습 중에 앙상블 가중치만 변경할 수도 있으며, 서브 네트워크의 매개 변수는 고정된다. 이는 본 과정에서 y_i를 변경할 수 없음을 의미한다. 조정 단계의 손실 함수는 하기 수학식 5와 같다.In this process, ensemble weights are adjusted, and unnecessary subnetworks are removed according to these weights. In the ensemble weight adjustment, unlike the enhancement phase, only the ensemble weights can be changed during learning, and the parameters of the subnetwork are fixed. This means that y _i cannot be changed in this process. The loss function of the adjustment step is as shown in Equation 5 below.

학습에 사용된 데이터 세트는 원(original) 데이터 세트이다. 즉, 이 데이터 세트에는 교육 데이터 세트의 모든 데이터 포인트가 있다. The data set used for training is the original data set. That is, this data set contains all the data points from the training data set.

앙상블 가중치 학습은 Epoch_AS 회수로 수행될 수 있다. 그런 다음 각 하위 네트워크의 앙상블 가중치가 사전 정의된 임계값(α_th)보다 작은 지 여부를 검사한다. 임계값 보다 작다면, 이 서브 네트워크는 앙상블 모델에서 제거된다. 이러한 단계를 수행하는 이유는 앙상블 가중치가 낮은 서브 네트워크가 앙상블 가중치가 큰 서브 네트워크에 비해 정확한 예측에 거의 기여하지 않는 바, 불필요한 학습 시간을 줄이기 위해 제거한다.Ensemble weight learning can be performed with _{Epoch AS number.} Then, it is checked whether the ensemble weight of each subnetwork _{is less than a predefined threshold (α th ).} If it is less than the threshold, this subnetwork is removed from the ensemble model. The reason for performing this step is that a sub-network with a low ensemble weight hardly contributes to an accurate prediction compared to a sub-network with a large ensemble weight, so it is removed to reduce unnecessary learning time.

예컨대, a1 = 0.01, a2 = 0.05, a3 = 0.15 및 a4 = 0.79인 상황에서, αth = 0.02로 설정하면, SN1이 앙상블 모델에서 삭제된다. α_th = 0.1 인 경우, 상기 상황에서 SN1과 SN2가 함께 삭제된다. 따라서 α_th는 학습후 남게되는 서브 네트워크 수에 영향을 준다. 가벼운 앙상블 모델이 필요한 경우 α를 높게 설정하면 되지만, α_th가 너무 크면 정확도가 크게 저하될 수 있다.For example, in the situation where a1 = 0.01, a2 = 0.05, a3 = 0.15, and a4 = 0.79, if αth = 0.02, SN1 is deleted from the ensemble model. When α _th = 0.1, SN1 and SN2 are deleted together in the above situation. Therefore, α _th affects the number of remaining subnetworks after learning. If you need a lightweight ensemble model, you can set α high, but if α _th is too large, the accuracy can be greatly reduced.

다음, 상기 학습의 완료 여부를 확인하는 단계(S50)를 설명하겠다.Next, the step of checking whether the learning is completed (S50) will be described.

상술한 S40 단계의 과정으로 영향도가 낮은 서브 네트워크들이 삭제된 후, 중간 점검을 수행하기 위한 과정이 시작될 수 있다. 이 중간 점검 과정은 NMAX 번 반복되어 수행될 수 있으며, 반복이 계속됨에 따라 서브 네트워크 수는 줄어든다. 또한 서브 네트워크의 삭제로 인해 각 단계를 수행하는데 요구되는 시간도 줄어든다. 반복 중에 앙상블 모델이 충분히 학습된 것으로 간주되면 모든 학습 과정이 즉시 종료될 수 있다. After the sub-networks with low influence are deleted in the process of step S40, a process for performing an intermediate check may be started. This intermediate check process may be repeated NMAX times, and as the iteration continues, the number of subnetworks decreases. Also, the time required to perform each step is reduced due to the deletion of the subnetwork. If the ensemble model is considered sufficiently trained during iteration, all learning processes can be terminated immediately.

반복 횟수가 NMAX에 도달할 때까지 상기 서브 네트워크 전체에 대한 각 차수의 학습에 의한 각 조정 과정이 완료될 때마다, 학습 완료 여부의 확인이 수행된다. 즉, 일종의 중간 점검으로서, 그 목표는 앙상블 모델에 더 이상의 학습 반복이 필요하지 않은지 확인하고 학습 종료를 결정하는 것이다. Whenever each adjustment process by learning of each order for the entire subnetwork is completed until the number of iterations reaches NMAX, it is checked whether learning is completed. That is, as a kind of intermediate check, the goal is to determine that the ensemble model does not require any more training iterations and to determine the end of training.

이 과정에서 앙상블 모델이 조건을 만족하면 반복 루프는 종료된다. 그렇지 않으면, 상기 각 서브 네트워크에 대한 학습을 수행하는 단계가 다시 수행된다.In this process, if the ensemble model satisfies the condition, the iteration loop is terminated. Otherwise, the step of performing learning for each of the sub-networks is performed again.

학습 단계의 반복 루프의 수행이 다른 머신 러닝 알고리즘의 부분들과 비교하여 더 많은 시간을 소요하는 바, 학습 알고리즘의 경우 모델 학습에 소요되는 총 시간을 단축하기 위해 상술한 바와 같은 중간 점검 과정이 필요하다.Since the iterative loop of the learning phase takes more time compared to other parts of the machine learning algorithm, in the case of the learning algorithm, an intermediate check process as described above is necessary to shorten the total time required to train the model. do.

도 3은 중간 점검을 위한 EMA_k 계산에 필요한 흐름을 나타낸 개념도이다.3 is a conceptual diagram illustrating a flow required _{for calculating EMA k} for an intermediate check.

상기 도 3을 참조하여 상기 학습의 완료 여부를 확인하는 단계(S50)에서 확인하는 조건에 대해 알아보겠다. With reference to FIG. 3 , the conditions for checking in the step S50 of checking whether the learning is completed will be described.

앙상블 모델은 모델이 수렴하면, 더 이상 학습할 필요성이 존재하지 않는다. 앙상블 가중치가 더 이상 변경되지 않으면 앙상블 모델이 수렴한다고 볼 수 있다. 따라서, 상기 학습의 완료 여부를 확인하는 단계에서 확인하는 조건은 모델의 수렴 여부일 수 있으며, 이에 필요한 계산 흐름은 도 3에 나타나 있다. An ensemble model no longer needs to be trained once the model converges. When the ensemble weights do not change any more, it can be said that the ensemble model converges. Accordingly, the condition to be checked in the step of checking whether the learning is completed may be whether the model converges or not, and the required calculation flow is shown in FIG. 3 .

k번째 반복에서 모든 앙상블 가중치를 ew_k로 표시할 수 있다. 상술한 학습 모델을 형성하는 단계에서 설명했듯이, 앙상블 가중치는 softmax의 기능의 특성으로 인해 두 가지 조건을 충족시킨다. 따라서 ew_k의 모든 요소의 합은 1이며 각 요소는 [0, 1] 간격에 포함된다. 이것은 또한 확률 분포의 속성이므로 우리는 ew_k를 확률 분포처럼 취급할 수 있다. We can denote all ensemble weights as ew _{k in the kth iteration.} As described in the step of forming the learning model above, the ensemble weights satisfy two conditions due to the characteristics of the softmax function. Therefore, _{the sum of all elements of ew k} is 1, and each element is included in the interval [0, 1]. This is also a property of a probability distribution, so we _{can treat ew k} like a probability distribution.

앙상블 모델이 수렴하면, ew_k는 ew_k-1과 차이가 없으며, k-1 번째 반복에서의 앙상블 가중치가 결정된다. 여기서, 상기 차이를 측정하기 위해 하나의 확률 분포가 다른 확률 분포와 어떻게 다른지에 대한 척도로서 Kullback-Leibler divergence(KLD)를 적용할 수 있다. 주어진 확률 분포가 비슷하면 KLD가 낮고 반대의 경우에는 KLD가 커진다. 또한 주어진 두 항목이 정확히 같은 경우 KLD가 0이 된다. 따라서, KL_k로 표시되는 ew_k-1과 ew_k 사이의 KLD가 0 일 때 k를 찾는 것으로 상기 학습이 완료된 상태의 가중치를 확인할 수 있다.When the ensemble model converges, ew _k does not differ from ew _{k-1 ,} and the ensemble weight at the k-1th iteration is determined. Here, in order to measure the difference, Kullback-Leibler divergence (KLD) may be applied as a measure of how one probability distribution differs from another probability distribution. If the given probability distributions are similar, the KLD is low, and vice versa, the KLD is large. Also, if two given items are exactly the same, KLD is 0. Therefore, _{when the KLD between ew k-1} and ew _k expressed as _{KL k} is 0, it is possible to check the weight in the state in which the learning is completed by finding k.

하기 수학식 6은 KL_k의 방정식을 보여준다. 여기서, D_KL(x||y)는 두 확률 분포 x와 y 사이의 KLD를 의미한다. a_i,k는 반복 k에서의 SN_i의 앙상블 가중치를 나타낸다.Equation 6 below shows the equation _{of KL k .} Here, D _KL (x||y) means the KLD between two probability distributions x and y. a _i ,k represents the ensemble weight _{of SN i} in iteration k.

그러나, KL_k가 0이되는 상황은 이상적인 경우에만 발생하며 실제로 KL_k는 0이 아닌 이에 가까운 특정 값으로 수렴된다. 따라서, 이 값을 0 대신 임계값으로 간주하면 되지만, 이 값은 서브 네트워크 수, 임의 초기화 및 학습 데이터 세트와 같은 많은 요인의 영향을 받기 때문에, 통제하거나 예측하기 곤란하다. 이 문제를 완화하기 위해 KL_k와 KL_k-1 사이의 감산을 적용한다. 감산 결과가 0이면 KL_k와 KL_k-1 사이에 차이가 없으며 KL_k가 특정 값으로 수렴한다고 판정할 수 있다. DIFF_k는 KL_k와 KL_k-1의 감산 결과이다. However, _{the situation in which KL k} becomes 0 occurs only in the ideal case, and in reality, KL _k converges to a specific value that is not 0 and close to it. Therefore, we can consider this value as a threshold value instead of 0, but it is difficult to control or predict because this value is affected by many factors such as the number of subnetworks, random initialization, and training data set. To alleviate this problem, a subtraction between _{KL k} and KL _{k-1 is applied.} If the subtraction result is 0, there is no difference between _{KL k} and KL _k-1 _{, and it can be determined that KL k} converges to a specific value. DIFF _k is the result of subtraction _{of KL k} and KL _k-1.

그러나, 경험적으로 DIFF_k는 수회의 반복 후에 0이 되기 때문에, DIFF_k를 기준으로 사용하기가 어렵다. DIFF_k는 각 서브 네트워크에 대한 학습 시간을 불충분하게하여 부정확한 반복(루프)를 야기할 수 있다. 따라서, 서브 네트워크에 대한 충분한 학습 시간을 보장하기 위해 DIFF_k의 지수적 이동의 평균을 채택한다. k번째 반복에서의 EMA_k의 지수적 이동의 평균은 하기 수학식 7과 같다.However, empirically, DIFF _k becomes 0 after several iterations, so it is difficult to use _{DIFF k as a reference.} DIFF _k may cause insufficient training time for each subnetwork, resulting in inaccurate iterations (loops). Therefore, we adopt the average of exponential shifts of _{DIFF k} to ensure sufficient learning time for the subnetwork. _{The average of the exponential movement of EMA k} in the k-th iteration is as shown in Equation 7 below.

상기 수학식에서, α∈ [0,1]이 업데이트 계수이고, 요약하면, EMA_k가 0이되면 학습 반복(루프)이 완료된다. α가 크면 학습 반복이 일찍 완료된다. 반대의 경우, 필요한 반복 시간이 NMAX에 가까울 수 있지만, 이전의 경우보다 더 나은 정확도를 얻을 수 있다. 그러나, N이 작으면 몇번 반복한 후 EMA_k가 0이 될 수도 있다. 이는 초기 반복에서 앙상블 가중치의 변화가 더 큰 결과이다. 이 경우 α는 더 크거나 최소 반복 횟수를 설정해야 한다. In the above equation, α∈ [0,1] is the update coefficient, and in summary, when EMA _k becomes 0, the learning iteration (loop) is completed. If α is large, the learning iteration is completed early. Conversely, the required iteration time may be close to NMAX, but with better accuracy than in the previous case. However, if N is small, EMA _k may become 0 after several iterations. This is a result of a larger change in the ensemble weights in the initial iteration. In this case, α should be larger or set the minimum number of iterations.

정리하자면, 상기 학습의 완료 여부를 확인하는 단계(S50)에서는, 이전에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과와, 금번에 수행한 상기 각 서브 네트워크에 대한 학습을 수행하는 단계 및 상기 서브 네트워크 전체에 대한 학습을 수행하는 단계를 수행한 결과를 비교하여, 상기 2 결과의 차이가 소정의 기준치 미만이면 학습이 완료된 것으로 판단할 수 있다.In summary, in the step (S50) of checking whether the learning is completed, the results of performing the previously performed learning on each of the sub-networks and the learning of the entire sub-network are performed; Comparing the results of performing the learning for each sub-network and the learning for the entire sub-network performed this time, if the difference between the two results is less than a predetermined reference value, learning is completed can judge

상기 S50 단계까지의 수행 결과 영향 요인들(날씨 정보, 달력 정보)을 입력으로 하고, 전력 데이터를 출력으로 구성된 모델의 학습이 완료된 바, 상기 결손된 전력 측정 데이터를 추정하는 단계(S60)에서는 완료된 모델에 결손된 시점의 영향 요인들에 대한 정보를 입력으로 반영하여, 결손된 시점의 전력 데이터를 추정/획득할 수 있다. In the step (S60) of estimating the missing power measurement data, the learning of the model composed of the power data as output is completed by inputting the factors (weather information, calendar information) influencing the performance results up to the step S50 as input. It is possible to estimate/acquire power data at the missing time by reflecting the information on the factors affecting the missing time in the model as input.

예컨대, 이렇게 획득된 전력 소비량에 대한 결손 데이터를 전체 측정된 전력 소비량 데이터와 병합하여, 중/장기적 예상 전력 소비 패턴을 얻을 수 있다. For example, by merging the thus-obtained deficit data on the power consumption with the total measured power consumption data, it is possible to obtain a mid/long-term expected power consumption pattern.

도 4는 본 발명의 사상에 따른 전력 측정 데이터 결측치 보완 서버를 도시한다.4 shows a power measurement data missing value complementation server according to the spirit of the present invention.

도시한 전력 측정 데이터 결측치 보완 서버(100)는, 전력 측정 데이터 및 상기 전력 측정 데이터에 영향을 주는 요인들에 대한 정보를 수집하는 데이터 수집부(110); 상기 수집된 데이터 및 정보를 최대 최소 정규화에 의해 정규화를 하는 데이터 전처리부(115); 상기 전처리된 데이터를 이용하여, 서브 네트워크들을 포함하는 구조를 가지며, 상기 요인들에 대한 정보가 입력으로 되고 상기 전력 측정 데이터가 출력으로 되는 학습 모델을 형성하는 학습 모델 형성부(120); 실측된 상기 전력 측정 데이터를 이용하여 형성된 상기 학습 모델을 학습시키는 학습 수행부(130); 및 학습이 완료된 상기 학습 모델을 이용하여 결손된 전력 측정 데이터를 추정하는 데이터 보간부(160)를 포함할 수 있다.The illustrated power measurement data missing value supplementation server 100 includes: a data collection unit 110 for collecting power measurement data and information on factors affecting the power measurement data; a data pre-processing unit 115 for normalizing the collected data and information by maximum and minimum normalization; a learning model forming unit 120 using the preprocessed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output; a learning performing unit 130 for learning the learning model formed using the measured power measurement data; and a data interpolator 160 for estimating the missing power measurement data using the learning model on which the learning has been completed.

상기 데이터 수집부(110)는 도 1에서 상술한 데이터/정보를 수집하는 단계(S10)를 수행하며, 상기 데이터 전처리부(115)는 상술한 학습을 위한 형태로 전처리하는 단계(S15)를 수행하며, 상기 학습 모델 형성부(120)는 상술한 학습 모델을 형성하는 단계(S20)를 수행한다. The data collection unit 110 performs the step (S10) of collecting the data/information described above in FIG. 1, and the data preprocessor 115 performs the step (S15) of pre-processing in the form for learning described above. And, the learning model forming unit 120 performs the above-described learning model forming step (S20).

상기 학습 수행부(130)는 상술한 학습 모델을 형성하는 각 서브 네트워크에 대한 학습을 수행하는 단계(S30)와, 상기 학습 모델에 포함된 서브 네트워크 전체에 대한 학습을 수행하는 단계(S40)를 수행하고, 상기 학습 모델에 대한 학습의 완료 여부를 확인하는 단계(S50)를 수행한다. The learning performing unit 130 performs learning for each sub-network forming the above-described learning model (S30); A step (S40) of learning the entire sub-network included in the learning model is performed, and a step (S50) of confirming whether or not the learning of the learning model is completed is performed.

상기 데이터 보간부(160)는 도 1에서 상술한 결손된 전력 측정 데이터를 추정하는 단계(S60)를 수행한다.The data interpolator 160 performs the step of estimating the missing power measurement data described above in FIG. 1 ( S60 ).

상기 데이터 수집부(110)에서는 소정 기간 동안의 기상 정보 및 과거 전기 부하 데이터를 수집한다. 예컨대, 계통에 다수의 분산 설치된 전력 측정기들로부터 전력 관련 계측값들을 기록하고, 전력 소비에 영향을 주는 요인들로서 기상 정보들을 수집하여 기록하는 중앙 수집 장치(200)를 구비하는 시스템의 경우, 상기 데이터 수집부(110)는 도시한 바와 같이 별도의 DB 서버로서 중앙 수집 장치(200)에서 기록된 데이터/정보를 취득할 수 있다. The data collection unit 110 collects weather information and past electrical load data for a predetermined period. For example, in the case of a system having a central collection device 200 that records power-related measured values from a plurality of distributed power meters installed in the system, and collects and records weather information as factors affecting power consumption, the data As illustrated, the collection unit 110 may acquire data/information recorded in the central collection device 200 as a separate DB server.

상기 전처리부(115)에서는 학습 속도와 정확도를 위해 변수들을 최대 최소 정규화(Min-Max normalization)를 통해 0~1사이의 값들로 변경하며, 월, 일, 시 및 분 데이터는 주기성을 반영하기 위해 sin, cos 함수를 적용하여 상술한 수학식 1 및 수학식 2와 같은 전처리를 수행할 수 있다. 보다, 구체적으로, period는 해당 변수의 주기를 의미하며, time에 월 데이터가 대입대는 경우 period는 해당 대입되는 데이터의 주기인 12가 되고, 시 데이터가 대입되는 경우 period 값이 24로 설정되도록, 전처리를 수행할 수 있다. 그 결과 상기 표 1과 같은 전처리 후 변수들을 얻을 수 있다.The preprocessor 115 changes the variables to values between 0 and 1 through Min-Max normalization for learning speed and accuracy, and month, day, hour and minute data to reflect periodicity. By applying the sin and cos functions, preprocessing such as Equations 1 and 2 may be performed. More specifically, period means the period of the variable, and when monthly data is substituted for time, period becomes 12, which is the period of the substituted data, and when hour data is substituted, the period value is set to 24, Pre-processing can be performed. As a result, variables after pretreatment as shown in Table 1 can be obtained.

상기 학습 모델 형성부(120)는 다수 개의 서브 네트워크들로 이루어진 학습 모델을 형성하기 위해 상기 전처리된 데이터/정보들에 대하여 데이터 샘플링을 수행할 수 있다.The learning model forming unit 120 may perform data sampling on the preprocessed data/information to form a learning model including a plurality of sub-networks.

상기 학습 모델 형성부(120)에서는 전력 소비 데이터 중 임의의 기간을 선택하며, 선택된 기간의 전력 소비량 데이터는 단일 모델 학습부에서 다층 퍼셉트론 모델을 학습하는데 사용된다. 위 과정은 일정 반복 수행 후 가중치 조정부에서 우수한 다층 퍼셉트론 모델(즉, 서브 네트워크)의 가중치를 높이는 작업을 수행한다. 수행 과정은 상술한 도 2와 같다. The training model forming unit 120 selects an arbitrary period of power consumption data, and the power consumption data of the selected period is used to learn the multilayer perceptron model in the single model learning unit. In the above process, the weight adjustment unit increases the weight of the excellent multi-layer perceptron model (ie, the subnetwork) after a certain iteration. The execution process is the same as in FIG. 2 described above.

상기 학습 수행부(130)는 기능적으로 살펴보았을 때, 상술한 학습 모델을 형성하는 각 서브 네트워크에 대한 학습을 수행하는 단일 모델 학습부와, 상기 학습 모델에 포함된 서브 네트워크 전체에 대한 학습에 따라 각 서브 네트워크에 대한 가중치를 조정하는 가중치 조정부와, 상기 학습 모델에 대한 학습의 완료 여부를 확인하는 모델 검증부로 구성된다고 볼 수 있다.When viewed functionally, the learning performing unit 130 includes a single model learning unit that performs learning for each sub-network forming the above-described learning model; It can be seen that the model includes a weight adjustment unit that adjusts weights for each sub-network according to learning of the entire sub-network included in the learning model, and a model verification unit that checks whether learning of the learning model is completed.

상기 학습 모델 형성부(120)에 의해 구성된 앙상블 모델은 상기 모델 검증부를 통해 모델의 검증이 수행된다. 상기 검증 과정은 상술한 도 3과 같다. 상기 검증은 학습 단계별로 쿨백 라이블러 발산을 통해 확률분포를 계산할 수 있으며, 이를 이용해 상기 모델의 학습 완성도를 검증할 수 있다.The ensemble model constructed by the learning model forming unit 120 is verified by the model verifying unit. The verification process is the same as in FIG. 3 described above. In the verification, a probability distribution can be calculated through Coolback Leibler divergence in each learning step, and the learning completion degree of the model can be verified using this.

본 발명이 속하는 기술 분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있으므로, 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Those skilled in the art to which the present invention pertains should understand that the present invention may be embodied in other specific forms without changing the technical spirit or essential characteristics thereof, so the embodiments described above are illustrative in all respects and not restrictive. only do The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

100 : 결측치 보완 서버
110 : 데이터 수집부
115 : 데이터 전처리부
120 : 학습 모델 형성부
130 : 학습 수행부
160 : 데이터 보간부
200 : 중앙 수집 장치100: missing value complement server
110: data collection unit
115: data preprocessor
120: learning model forming unit
130: learning execution unit
160: data interpolation unit
200: central collection device

Claims

collecting power measurement data and information on factors affecting the power measurement data;
preprocessing the collected power measurement data and information on the factors into a form for learning;
using the preprocessed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output;
performing learning for each sub-network forming the learning model by using the measured power measurement data;
performing learning on the entire subnetwork included in the learning model;
checking whether learning on the learning model is completed; and
Estimating missing power measurement data using the learning model on which learning is completed
A method of complementing missing values of power measurement data, including

According to claim 1,
The power measurement data is a measured value for power consumption,
The above factors are meteorological data and calendar information, a method of supplementing missing power measurement data.

According to claim 1,
The learning model is an ensemble model,
In the step of pre-processing the measurement data in a form for learning,
Normalizing the input data of the ensemble learning model by maximum and minimum normalization in a way that converts the input items for each factor into a real value having an absolute value between 0 and 1, a positive real value, or a vector value How to compensate for missing power measurement data.

4. The method of claim 3,
In the step of forming the learning model,
Determining the number of subnetworks to be included in the learning model,
After random sampling of the preprocessed data,
A method of compensating for missing values of power measurement data to form subnetworks having ensemble weights.

3. The method of claim 2,
In the step of performing learning for each sub-network,
A method of compensating for missing values of power measurement data in which learning is performed a predetermined number of times in the direction of minimizing a loss function based on the difference between actual power consumption data and expected energy consumption data for each subnetwork.

6. The method of claim 5,
The loss function is a power measurement data missing value compensation method, characterized in that according to the following equation.

(where t is the actual electrical energy consumption data, y _i is the expected energy consumption of the subnetwork (SN _{i ))}

According to claim 1,
In the step of performing learning for the entire subnetwork,
A method for compensating for missing values in power measurement data by checking whether the ensemble weight of the sub-network _{is smaller than a predefined threshold value (α th} ), and deleting the corresponding sub-network from the learning model if it is smaller than the threshold value.

According to claim 1,
In the step of performing learning for the entire subnetwork,
A method for compensating for missing values of power measurement data for performing learning in a direction of minimizing a loss function according to the following equation for the entire subnetwork.

(where t is the actual electrical energy consumption data, y _out is the estimated energy consumption of the entire subnetwork)

According to claim 1,
In the step of checking whether the learning is completed,
A result of performing the previously performed learning for each sub-network and performing the learning for the entire sub-network;
By comparing the results of the step of performing the learning on each of the sub-networks performed this time and the learning of the entire sub-network,
If the difference between the two results is less than a predetermined reference value, it is determined that the learning is completed.

a data collection unit for collecting power measurement data and information on factors affecting the power measurement data;
a data preprocessing unit that normalizes the collected data and information by maximum and minimum normalization;
a learning model forming unit that uses the preprocessed data to form a learning model having a structure including sub-networks, in which information on the factors is input and the power measurement data is output;
a learning performing unit for learning the learning model formed using the measured power measurement data; and
A data interpolator for estimating the missing power measurement data using the learning model on which learning has been completed
Compensation server for missing power measurement data comprising a.

11. The method of claim 10,
The power measurement data is a measured value for power consumption,
The above factors are meteorological information including temperature and calendar information including month, day, and day of the week, power measurement data missing value supplement server.

12. The method of claim 11,
The data preprocessor,
The weather information is converted into numerical data between 0 and 1 through maximum and minimum normalization,
The month and day information is converted into a vector value having an absolute value between 0 and 1,
The day information is a power measurement data missing value supplement server that converts integer values of 0 and 1.

11. The method of claim 10,
The learning execution unit,
By using the measured power measurement data, learning for each sub-network forming the learning model is performed,
performing learning on the weight of each sub-network on the learning model,
A power measurement data missing value supplementation server for determining whether learning for the learning model is completed.

14. The method of claim 13,
The learning execution unit,
A result of performing the previously performed learning for each sub-network and performing the learning for the entire sub-network;
By comparing the results of the step of performing the learning for each of the sub-networks performed this time and the step of performing the learning on the entire sub-network,
If the difference between the two results is less than a predetermined reference value, the power measurement data missing value supplementation server for determining that the learning is completed.