KR102583178B1

KR102583178B1 - Method and apparatus for predicting power generation

Info

Publication number: KR102583178B1
Application number: KR1020210152453A
Authority: KR
Inventors: 장윤; 박혜연; 정성민
Original assignee: 세종대학교산학협력단
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2023-09-26
Also published as: KR20230066927A

Abstract

실시예는 발전량 예측 방법 및 장치에 관한 것으로서, 입력 변수의 그래프의 형태적 특징을 이용하여 입력 변수를 복수의 특징 데이터 셋으로 분류하여 발전량을 예측하며, 입력 변수와 예측 발전량의 관계를 그래프를 이용하여 시각적으로 표현한, 발전량 예측 방법 및 장치를 제공한다. The embodiment relates to a method and device for predicting power generation. The power generation is predicted by classifying input variables into a plurality of characteristic data sets using the morphological characteristics of the graph of the input variable, and the relationship between the input variables and the predicted power generation is predicted using a graph. Provides a visually expressed method and device for predicting power generation.

Description

Method and device for predicting power generation {METHOD AND APPARATUS FOR PREDICTING POWER GENERATION}

본 발명의 실시예는 발전량 예측 방법 및 장치에 관한 것으로서, 보다 상세하게는 입력 데이터의 그래프를 도출하고, 그래프의 형태를 이용하여 발전량을 예측하는 발전량 예측 방법 및 장치에 관한 것이다. Embodiments of the present invention relate to a method and device for predicting power generation, and more specifically, to a method and device for predicting power generation by deriving a graph of input data and predicting power generation using the shape of the graph.

태양광 발전은 화석연료의 유력한 대체재이자 스마트 그리드(smart grid)의 핵심 역할로서 주목 받고 있다. 태양광 발전량은 기상조건과 발전 단지의 물리적 조건에 의해 발전량의 편차가 심하여 예측이 어렵다. 이는 에너지 생산 및 분배의 불균형 문제로 연결되며, 전력계통의 안정성을 감소시킨다. 따라서, 안정적인 스마트 그리드의 운용을 위해 정확도 높은 태양광 발전량의 예측이 필요하다. Solar power generation is attracting attention as a powerful substitute for fossil fuels and as a key player in the smart grid. It is difficult to predict the amount of solar power generation because the amount varies greatly depending on weather conditions and the physical conditions of the power generation complex. This leads to an imbalance in energy production and distribution and reduces the stability of the power system. Therefore, highly accurate prediction of solar power generation is necessary for stable smart grid operation.

태양광 에너지를 생산하는 태양광 패널에서는 다변량 시계열 데이터가 수집된다. 다변량 시계열 데이터는 비선형성과 비정상성을 가진다. 시간에 따라 평균, 분산, 공분산과 같은 통계적 독립변수들이 변하며 불규칙한 변동들이 나타나기 때문에 다변량 시계열 데이터의 정확한 예측이 어렵다. Multivariate time series data is collected from solar panels that produce solar energy. Multivariate time series data has nonlinearity and nonstationarity. Accurate prediction of multivariate time series data is difficult because statistical independent variables such as mean, variance, and covariance change over time and irregular fluctuations occur.

특히, 입력 변수들 간에 강한 상관관계가 존재하는 경우, 학습 과정에서 다중공선성의 문제가 발생할 수 있다. 다중공선성이란 예측 모델의 학습을 위한 입력 변수들 간에 독립적인 관계가 아닌 서로 강한 상관관계가 존재하는 경우를 의미한다. In particular, if there is a strong correlation between input variables, multicollinearity problems may occur during the learning process. Multicollinearity refers to a case where there is a strong correlation between input variables for learning a prediction model, rather than an independent relationship.

따라서, 데이터에 다중공선성(Multicollinearity)이 존재하는 경우, 예측 모델의 계수 추정이 불가하거나, 데이터의 미세한 변화에 따라 예측 모델의 예측 결과가 크게 달라질 수 있다는 문제점이 있다. Therefore, if multicollinearity exists in the data, there is a problem that it is impossible to estimate the coefficients of the prediction model, or the prediction results of the prediction model may vary greatly depending on subtle changes in the data.

또한, 다변량 시계열 데이터의 예측을 위해, 다변량 시계열 데이터에서 과거와 현재 간의 종속관계들을 분석하여 향후 데이터 변화를 예측하기 위한 다양한 연구들이 수행되었다. 하지만, 종래의 예측 방법은 예측 모델의 훈련 시, 학습 데이터의 순서와 연속성을 보장하지 않으므로 시계열 예측에 한계가 존재한다. Additionally, in order to predict multivariate time series data, various studies have been conducted to predict future data changes by analyzing dependency relationships between the past and present in multivariate time series data. However, conventional prediction methods do not guarantee the order and continuity of learning data when training a prediction model, so there are limitations in time series prediction.

또한, 과거의 상태를 보존하는 순환 구조를 가지므로, 예측 모델의 훈련이 반복됨에 따라 복잡도가 증가하여, 모델의 내부 구조 또는 예측 데이터 도출 과정을 설명하기 어렵다는 문제점이 있다.In addition, since it has a cyclical structure that preserves the past state, complexity increases as training of the prediction model is repeated, making it difficult to explain the internal structure of the model or the process of deriving prediction data.

본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 입력 데이터 그래프의 형태를 이용하여 발전량을 예측하기 위한 발전량 예측 방법 및 장치를 제공하기 위한 것이다. The purpose of the power generation prediction method and device according to an embodiment of the present invention is to provide a power generation prediction method and device for predicting power generation using the form of an input data graph.

또한, 본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 입력 변수 간의 다중공선성이 발생하지 않도록 하기 위한 발전량 예측 방법 및 장치를 제공하기 위한 것이다. In addition, the method and device for predicting power generation according to an embodiment of the present invention are intended to provide a method and device for predicting power generation to prevent multicollinearity between input variables.

또한, 본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 발전량 예측 결과의 도출 과정을 시각화하여 제공하기 위한 발전량 예측 방법 및 장치를 제공하기 위한 것이다. In addition, the power generation prediction method and device according to an embodiment of the present invention are intended to provide a power generation prediction method and device for visualizing and providing the process of deriving the power generation prediction result.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical challenge that this embodiment aims to achieve is not limited to the technical challenges described above, and other technical challenges may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 발전량 예측 방법은, 복수의 입력 변수 데이터 중 어느 하나 이상의 입력 변수 데이터에 대한 제1 그래프를 생성하는 단계, 상기 제1 그래프의 형태를 기준으로 상기 입력 변수 데이터를 복수의 특징 데이터 셋으로 분류하는 단계, 상기 복수의 특징 데이터 셋을 이용하여 학습된 발전량 예측 모델에 상기 입력 변수 데이터를 입력하여 발전량을 예측하는 단계를 포함한다. As a technical means for achieving the above-described technical problem, a method for predicting power generation includes generating a first graph for one or more input variable data among a plurality of input variable data, the input based on the form of the first graph. It includes classifying variable data into a plurality of feature data sets and predicting power generation by inputting the input variable data into a power generation prediction model learned using the plurality of feature data sets.

또한, 실시예에 따른 제1 그래프를 생성하는 단계는, 상기 복수의 입력 변수 데이터 중 어느 하나 이상을 독립 변수 및/또는 종속 변수로 하는 제2 그래프를 생성하는 단계, 상기 제2 그래프를 이용하여 상기 독립 변수와 상기 종속 변수의 상관도를 도출하는 단계, 상기 상관도가 미리 설정된 기준값 이상인 제3 그래프를 선택하는 단계, 그리고 상기 제3 그래프에 대응하는 입력 변수 중 어느 하나 이상을 이용하여 상기 제1 그래프를 생성하는 단계를 포함한다.In addition, the step of generating the first graph according to the embodiment includes generating a second graph using one or more of the plurality of input variable data as an independent variable and/or a dependent variable, using the second graph deriving a correlation between the independent variable and the dependent variable, selecting a third graph whose correlation is greater than or equal to a preset reference value, and using one or more of the input variables corresponding to the third graph. 1 Includes the step of creating a graph.

또한, 실시예에 따른 발전량을 예측하는 단계는, 상기 제1 그래프의, 최대값, 최소값, 변곡점의 위치, 변곡점의 개수, 기울기, 분포도, 평균값 중 하나 이상을 이용하여 상기 입력 변수 데이터를 상기 특징 데이터 셋으로 분류하는 단계를 포함한다. In addition, the step of predicting the amount of power generation according to the embodiment includes the input variable data using one or more of the maximum value, minimum value, location of the inflection point, number of inflection points, slope, distribution, and average value of the first graph. It includes the step of classifying data sets.

또한, 실시예에 따른 발전량 예측 장치는, 발전량 예측 프로그램을 저장하는 메모리, 그리고 상기 메모리에 저장된 발전량 예측 프로그램을 실행하는 프로세서를 포함하며, 상기 프로세서는 상기 발전량 예측 프로그램을 실행하여, 복수의 입력 변수 데이터 중 어느 하나 이상의 입력 변수 데이터에 대한 제1 그래프를 생성하고, 상기 제1 그래프의 형태를 기준으로 상기 입력 변수 데이터를 복수의 특징 데이터 셋으로 분류하며, 상기 복수의 특징 데이터 셋을 이용하여 학습된 발전량 예측 모델에 상기 입력 변수 데이터를 입력하여 발전량을 예측한다. In addition, the power generation prediction device according to the embodiment includes a memory for storing a power generation prediction program, and a processor for executing the power generation prediction program stored in the memory, wherein the processor executes the power generation prediction program to generate a plurality of input variables. Generating a first graph for one or more input variable data among the data, classifying the input variable data into a plurality of feature data sets based on the shape of the first graph, and learning using the plurality of feature data sets The power generation amount is predicted by inputting the input variable data into the generated power generation prediction model.

본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 입력 데이터 그래프의 형태를 이용하여 발전량을 예측하기 위한 발전량 예측 방법 및 장치를 제공할 수 있다. The power generation prediction method and device according to an embodiment of the present invention can provide a power generation prediction method and device for predicting power generation using the form of an input data graph.

또한, 본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 입력 변수 간의 다중공선성이 발생하지 않도록 하기 위한 발전량 예측 방법 및 장치를 제공할 수 있다. In addition, the method and device for predicting power generation according to an embodiment of the present invention can provide a method and device for predicting power generation to prevent multicollinearity between input variables.

또한, 본 발명의 실시예에 따른 발전량 예측 방법 및 장치는 발전량 예측 결과의 도출 과정을 시각화하여 제공하기 위한 발전량 예측 방법 및 장치를 제공할 수 있다. In addition, the power generation prediction method and device according to an embodiment of the present invention can provide a power generation prediction method and device for visualizing and providing the process of deriving the power generation prediction result.

도 1은 실시예에 따른 발전량 예측 장치의 구성도이다.
도 2는 실시예에 따른 발전량 예측 방법의 흐름도이다.
도 3은 실시예에 따른 제1 그래프 생성 방법의 흐름도이다.
도 4는 실시예에 따른 제2 그래프의 예시도이다.
도 5는, 실시예에 따른 발전량 그래프의 예시도이다.
도 6은 실시예에 따른 제1 그래프의 예시도이다.
도 7은 실시예에 따른 특징 데이터 셋 분류의 예시도이다.
도 8은 실시예에 따른 특징 데이터 셋 분류의 예시도이다.
도 9는 실시예에 따른 앙상블 모델의 예시도이다.
도 10은 실시예에 따른 앙상블 모델의 데이터 처리 개념도이다.
도 11은 실시예에 따른 발전량 예측 방법의 흐름도이다.
도 12는 실시예에 따른 제4 그래프 생성 방법의 흐름도이다.
도 13은 실시예에 따른 제5 그래프의 예시도이다.
도 14는 실시예에 따른 제5 그래프의 예시도이다.
도 15는 실시예에 따른 제5 그래프의 예시도이다.
도 16은 실시예에 따른 제6 그래프의 예시도이다.
도 17 은 실시예에 따른 발전량 예측 결과의 예시도이다.
도 18은 실시예에 따른 발전량 예측 결과의 예시도이다.
도 19는 실시예에 따른 발전량 예측 결과의 예시도이다.1 is a configuration diagram of a power generation prediction device according to an embodiment.
Figure 2 is a flowchart of a method for predicting power generation according to an embodiment.
Figure 3 is a flowchart of a first graph generation method according to an embodiment.
Figure 4 is an example diagram of a second graph according to an embodiment.
Figure 5 is an example diagram of a power generation graph according to an embodiment.
Figure 6 is an example diagram of a first graph according to an embodiment.
Figure 7 is an example diagram of feature data set classification according to an embodiment.
Figure 8 is an example diagram of feature data set classification according to an embodiment.
Figure 9 is an example diagram of an ensemble model according to an embodiment.
Figure 10 is a conceptual diagram of data processing of an ensemble model according to an embodiment.
Figure 11 is a flowchart of a method for predicting power generation according to an embodiment.
Figure 12 is a flowchart of a fourth graph generation method according to an embodiment.
Figure 13 is an example diagram of a fifth graph according to an embodiment.
Figure 14 is an example diagram of a fifth graph according to an embodiment.
Figure 15 is an example diagram of a fifth graph according to an embodiment.
Figure 16 is an example diagram of a sixth graph according to an embodiment.
Figure 17 is an exemplary diagram of power generation prediction results according to an embodiment.
Figure 18 is an exemplary diagram of power generation prediction results according to an embodiment.
Figure 19 is an exemplary diagram of power generation prediction results according to an embodiment.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected," but also the case where it is "electrically connected" with another element in between. . Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

본 명세서에 있어서 '장치'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '유닛'은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '유닛'은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '유닛'은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '유닛'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '유닛'들로 결합되거나 추가적인 구성요소들과 '유닛'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '유닛'들은 디바이스 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, 'device' includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Additionally, one unit may be realized using two or more pieces of hardware, and two or more units may be realized using one piece of hardware. Meanwhile, 'unit' is not limited to software or hardware, and the 'unit' may be configured to reside in an addressable storage medium or may be configured to reproduce one or more processors. Therefore, as an example, a 'unit' may include components such as software components, object-oriented software components, class components and task components, processes, functions, properties, procedures, Includes subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and 'units' may be combined into a smaller number of components and 'units' or may be further separated into additional components and 'units'. Additionally, components and 'units' may be implemented to refresh one or more CPUs within the device.

또한, 첨부된 도면은 본 명세서에 개시된 실시예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. In addition, the attached drawings are only for easy understanding of the embodiments disclosed in this specification, and the technical idea disclosed in this specification is not limited by the attached drawings, and all changes included in the spirit and technical scope of the present invention are not limited. , should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms containing ordinal numbers, such as first, second, etc., may be used to describe various components, but the components are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this application, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

이하, 도 1을 참조하여 실시예에 따른 발전량 예측 장치(1)의 구성을 설명한다. Hereinafter, the configuration of the power generation prediction device 1 according to the embodiment will be described with reference to FIG. 1.

도 1은 실시예에 따른 발전량 예측 장치의 구성도이다. 1 is a configuration diagram of a power generation prediction device according to an embodiment.

도 1을 참조하면, 발전량 예측 장치(1)는 입력 변수 데이터 및 발전량 데이터를 이용하여 발전량을 예측한다. 따라서, 발전기(100)로부터 발전량 데이터를 수신하거나, 데이터베이스(200)에 누적하여 저장된 발전량 데이터, 입력 변수 데이터를 수신할 수 있다. Referring to FIG. 1, the power generation prediction device 1 predicts power generation using input variable data and power generation data. Accordingly, power generation data can be received from the generator 100, or power generation data and input variable data accumulated and stored in the database 200 can be received.

즉, 발전기(100)는 발전량 데이터를 데이터베이스(200) 또는 메모리(300)로 전송할 수 있다. 발전기(100)는 하나 또는 복수개의 발전기를 포함할 수 있다. 또한, 데이터베이스(200)는 발전기(100)에서 수신한 발전량 데이터 또는 발전량과 입력 변수에 관한 빅데이터를 수집 및 저장하며, 이를 메모리(300)로 전송할 수 있다. That is, the generator 100 may transmit power generation data to the database 200 or memory 300. Generator 100 may include one or multiple generators. Additionally, the database 200 collects and stores power generation data received from the generator 100 or big data regarding power generation and input variables, and can transmit this to the memory 300.

입력 변수 데이터는 기상에 관련된 다변량 시계열 데이터를 포함할 수 있다. 예를 들어, 발전기(100)는 복수의 태양광 발전기를 포함하는 경우, 입력 변수 데이터는 온도, 습도, 일조량, 강유량, 풍속, 일출몰 시각 등에 관한 시계열 데이터를 포함할 수 있다. 또한, 입력 변수 데이터는 측정된 데이터뿐만 아니라 사용자가 임의로 설정한 설정값을 포함할 수 있다. Input variable data may include multivariate time series data related to weather. For example, when the generator 100 includes a plurality of solar power generators, the input variable data may include time series data regarding temperature, humidity, sunlight, river flow, wind speed, sunrise and sunset times, etc. Additionally, the input variable data may include not only measured data but also settings arbitrarily set by the user.

상술한 태양열 발전기 및 기상 데이터는 단순히 설명의 편의를 위해 설명한 예시에 불과할 뿐, 실시예가 이에 한정되는 것은 아니다. 발전기(100)는 풍력 발전기, 화력 발전기, 수력 발전기, 원자력 발전기 등을 포함할 수 있으며, 입력 변수 데이터는 기상 데이터뿐만 아니라 발전량을 예측하기 위한 다양한 시계열 데이터 또는 그래프로 표현 가능한 데이터를 입력 변수 데이터로 이용할 수 있다. The above-mentioned solar power generator and weather data are merely examples described for convenience of explanation, and the embodiment is not limited thereto. The generator 100 may include a wind power generator, a thermal power generator, a hydroelectric power generator, a nuclear power generator, etc., and the input variable data includes not only weather data but also various time series data or data that can be expressed as a graph for predicting power generation as input variable data. Available.

메모리(300)는 발전량 예측 프로그램을 저장한다. 발전량 예측 프로그램은 설명의 편의를 위해 설정된 것으로, 명칭 그 자체로 프로그램의 기능을 제한하는 것은 아니다. 메모리(300)는 발전기(100)로부터 생성되거나 측정된 데이터, 데이터베이스(200)에 저장된 데이터, 프로세서(400)에 의해 수행되는 기능에 필요한 정보 및 데이터, 프로세서(400)의 실행에 따라 생성된 데이터 중 적어도 어느 하나 이상을 저장할 수 있다. The memory 300 stores a power generation prediction program. The power generation prediction program was established for convenience of explanation, and the name itself does not limit the function of the program. The memory 300 includes data generated or measured by the generator 100, data stored in the database 200, information and data required for functions performed by the processor 400, and data generated according to execution of the processor 400. At least one of them can be saved.

메모리(300)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력을 필요로 하는 휘발성 저장장치를 통칭하는 것으로 해석되어야 한다. 또한, 메모리(300)는 프로세서(400)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행할 수 있다. Memory 300 should be interpreted as a general term for non-volatile storage devices that continue to retain stored information even when power is not supplied and volatile storage devices that require power to maintain stored information. Additionally, the memory 300 may perform a function of temporarily or permanently storing data processed by the processor 400.

메모리(300)는 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치 외에 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. The memory 300 may include magnetic storage media or flash storage media in addition to volatile storage devices that require power to maintain stored information, but the scope of the present invention is not limited thereto. no.

더불어, 데이터베이스(200)는 발전량 예측 프로그램의 결과 누적 데이터, 발전량 예측을 위해 부수적으로 생성된 데이터 등을 메모리(300)로부터 수신하여 저장할 수 있다. 데이터베이스(200)는 메모리(300)의 일부를 구성할 수 있으나, 반드시 태양광 발전량 예측 장치(1)의 내부에 위치하는 것이 아니라 외부에 위치할 수도 있다. In addition, the database 200 may receive and store accumulated data as a result of the power generation prediction program, data incidentally generated for predicting power generation, etc. from the memory 300. The database 200 may form part of the memory 300, but is not necessarily located inside the solar power generation prediction device 1 and may be located outside the device.

프로세서(400)는 메모리(300)에 저장된 발전량 예측 프로그램을 실행하도록 구성된다. 프로세서(400)는 데이터를 제어 및 처리하는 다양한 종류의 장치들을 포함할 수 있다. 프로세서(400)는 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 일 예에서, 프로세서(400)는 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 형태로 구현될 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The processor 400 is configured to execute the power generation prediction program stored in the memory 300. The processor 400 may include various types of devices that control and process data. The processor 400 may refer to a data processing device built into hardware that has a physically structured circuit to perform functions expressed by codes or instructions included in a program. In one example, the processor 400 may include a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC), or an FPGA ( It may be implemented in the form of a field programmable gate array, etc., but the scope of the present invention is not limited thereto.

프로세서(400)는, 발전량 예측 프로그램을 실행하여 다음과 같은 기능 및 절차들을 수행하도록 구성된다. The processor 400 is configured to execute the power generation prediction program and perform the following functions and procedures.

프로세서(400)는 발전량 예측 프로그램을 실행하여, 발전량에 영향을 끼치는 입력 변수 데이터를 시각화하고, 시각화된 데이터의 형태적 특징을 이용하여 입력 변수 데이터를 복수의 특징 데이터 셋으로 분류한다. The processor 400 executes a power generation prediction program, visualizes input variable data that affects power generation, and classifies the input variable data into a plurality of feature data sets using morphological characteristics of the visualized data.

입력 변수 데이터를 이용한 그래프를 생성하는 것은 시각화의 한 방법으로 이용될 수 있다. 따라서, 특징 데이터 셋은 입력 변수 데이터의 그래프를 이용하여, 패턴 분리, 형태적 특징 등 시각적 특징을 이용하 분류된 데이터를 의미한다. 이 때, 형태적 특징이란 그래프의 최대값, 최소값, 변곡점의 위치, 변곡점의 개수, 기울기, 분포도, 평균값 등 시각적으로 인식되는 특징 중 어느 하나 이상을 의미한다. Creating a graph using input variable data can be used as a method of visualization. Therefore, the feature data set refers to data classified using visual features such as pattern separation and morphological features using a graph of input variable data. At this time, morphological features refer to one or more of the visually recognized features such as the maximum value, minimum value, location of inflection points, number of inflection points, slope, distribution, and average value of the graph.

프로세서(400)는 발전량 예측 프로그램을 실행하여, 특징 데이터 셋을 이용한 머신 러닝의 학습 및 발전량 예측을 수행한다. The processor 400 executes a power generation prediction program to perform machine learning learning and power generation prediction using a feature data set.

구체적으로, 프로세서(400)는 발전량 예측 프로그램을 실행하여 발전량 예측 모델을 생성할 수 있다. 발전량 예측 모델은 특징 데이터 셋을 학습 데이터로 하여 발전량 예측 결과를 도출하도록 학습된 인공지능 모델일 수 있다. Specifically, the processor 400 may execute a power generation prediction program to generate a power generation prediction model. The power generation prediction model may be an artificial intelligence model learned to derive power generation prediction results using a feature data set as learning data.

이때, 발전량 예측 모델은 앙상블 모델을 이용한다. 따라서, 복수의 입력 변수 데이터 각각에 대응하는 특징 데이터 셋을 이용하여, 각각의 입력 변수 데이터에 대한 발전량 예측을 수행한다. At this time, the power generation prediction model uses an ensemble model. Therefore, power generation prediction for each input variable data is performed using a feature data set corresponding to each of the plurality of input variable data.

또한, 프로세서(400)는 발전량 예측 프로그램을 실행하여, 특징 데이터 셋에 대한 결측값 처리, 여-존슨 변화(Yeo-Johnson Transformation), 최초값-최대값 스케일링(min-max scaling) 중 어느 하나 이상을 수행하여 상기 특징 데이터 셋의 전처리를 수행할 수 있다. In addition, the processor 400 executes a power generation prediction program to perform one or more of missing value processing, Yeo-Johnson Transformation, and min-max scaling for the feature data set. Preprocessing of the feature data set can be performed by performing .

또한, 프로세서(400)는 발전량 예측 프로그램을 실행하여, RNN(Recurrent Neural Networks), LSTM(Long Short Term Memory), DENSE 구조 중 어느 하나 이상을 이용하여 입력 변수 데이터에 대한 학습을 수행할 수 있다. Additionally, the processor 400 may execute a power generation prediction program and perform learning on input variable data using one or more of Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), and DENSE structures.

또한, 프로세서(400)는 발전량 예측 프로그램을 실행하여, 발전량 예측 결과의 도출 과정을 설명하기 위해, 입력 변수 데이터 또는 특징 데이터 셋과 발전량 예측 결과 사이의 인과관계에 관한 그래프를 생성할 수 있다. In addition, the processor 400 may execute a power generation prediction program to generate a graph regarding the causal relationship between input variable data or feature data sets and the power generation prediction result to explain the process of deriving the power generation prediction result.

즉, 프로세서(400)는 발전량 예측 프로그램을 실행하여, 발전량 예측 결과 도출 과정에서 도출되는 중간 결과 데이터를 이용하여, 발전량 예측 결과를 도출하는 과정을 시각적 수단으로 표현할 수 있다. That is, the processor 400 can execute the power generation prediction program and use intermediate result data derived in the process of deriving the power generation prediction result to express the process of deriving the power generation prediction result through visual means.

발전량 예측 프로그램을 이용한 발전량 예측 방법은, 후술하는 도 2 내지 도 19를 참조하여 상세히 설명한다. The method for predicting power generation using a power generation prediction program will be described in detail with reference to FIGS. 2 to 19 described later.

또한, 프로세서(400)는 통신부를 이용하여 단말기(500)와 정보 송수신을 수행할 수 있다. 따라서, 단말기(500)로부터 입력 변수 데이터를 수신하고나, 발전량 예측 프로그램의 결과를 단말기(500)로 전송하여 사용자에게 디스플레이할 수 있다. Additionally, the processor 400 can transmit and receive information with the terminal 500 using the communication unit. Accordingly, input variable data can be received from the terminal 500, or the results of the power generation prediction program can be transmitted to the terminal 500 and displayed to the user.

단말기(500)는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 휴대성과 이동성이 보장되는 무선 통신 장치 또는 스마트폰, 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 의미할 수 있다. 또한, 통신망은 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다. The terminal 500 is, for example, a laptop equipped with a web browser, a desktop, a laptop, a wireless communication device that guarantees portability and mobility, or any type of device such as a smartphone, tablet PC, etc. It may refer to a handheld-based wireless communication device. In addition, the communication network may be a wired network such as a Local Area Network (LAN), Wide Area Network (WAN), or Value Added Network (VAN), a mobile radio communication network, or a satellite communication network. It can be implemented with all types of wireless networks.

이하, 도 2를참조하여 실시예에 따른 발전량 예측 방법을 설명한다. Hereinafter, a method for predicting power generation according to an embodiment will be described with reference to FIG. 2.

도 2는 실시예에 따른 발전량 예측 방법의 흐름도이다. Figure 2 is a flowchart of a method for predicting power generation according to an embodiment.

도 2를 참조하면, 제1 그래프를 생성 단계(S100)에서는, 복수의 입력 변수 데이터 각각을 시각화한 하기 위해 제1 그래프를 생성한다. 그리고, 특징 데이터 셋 분류 단계(S200)에서는, 제1 그래프의 형태적 특징을 이용하여 입력 변수 데이터를 복수의 특징 데이터 셋으로 분류한다. 학습 및 예측 결과 도출 단계(S300)에서는, 특징 데이터 셋을 이용하여 발전량 예측 프로그램의 학습을 수행하여 발전량 예측 결과를 도출한다. Referring to FIG. 2, in the first graph generating step (S100), a first graph is generated to visualize each of a plurality of input variable data. Then, in the feature data set classification step (S200), the input variable data is classified into a plurality of feature data sets using the morphological characteristics of the first graph. In the learning and prediction result derivation step (S300), the power generation prediction result is derived by learning the power generation prediction program using the feature data set.

이하, 도 3 및 도 8을 참조하여 실시예에 따른 제1 그래프 생성 방법을 설명한다. Hereinafter, a method for generating a first graph according to an embodiment will be described with reference to FIGS. 3 and 8.

도 3은 실시예에 따른 제1 그래프 생성 방법의 흐름도이고,도 4는 실시예에 따른 제2 그래프의 예시도이고, 도 5는 실시예에 따른 발전량 그래프의 예시도이다. FIG. 3 is a flowchart of a method for generating a first graph according to an embodiment, FIG. 4 is an exemplary diagram of a second graph according to an embodiment, and FIG. 5 is an exemplary diagram of a power generation graph according to an embodiment.

도 3을 참조하면, 제2 그래프 생성 단계(S110)에서는, 복수의 입력 변수 데이터를 이용하여 입력 변수 중 어느 하나 이상을 독립 변수 및/또는 종속 변수로 하는 제2 그래프를 생성한다.Referring to FIG. 3, in the second graph creation step (S110), a second graph is generated using a plurality of input variable data with one or more of the input variables as an independent variable and/or a dependent variable.

도 4는, 태양광 발전량을 예측하기 위하여, 입력 변수가 강수량(Rain), 온도(Temprt), 풍속(Wnd_spd), 습도(Humdt), 구름량(Cloud), 태양광 발전량(Pow)인 경우 제2 그래프의 예시이다. Figure 4 shows that in order to predict solar power generation, the input variables are precipitation (Rain), temperature (Temprt), wind speed (Wnd_spd), humidity (Humdt), cloud amount (Cloud), and solar power generation (Pow). 2 This is an example of a graph.

x축은 독립 변수, y축은 종속 변수에 해당하며 독립 변수 및 종속 변수는 상술한 입력 변수 중 어느 하나를 선택하여 페어 플롯(PairPlot)을 생성한다. 따라서, 강수량(Rain), 온도(Temprt), 풍속(Wnd_spd), 습도(Humdt), 구름량(Cloud), 태양광 발전량(Pow)을 각각이 독립 변수 및 종속 변수로 설정된 36개의 그래프가 생성된다. The x-axis corresponds to the independent variable and the y-axis corresponds to the dependent variable, and a pair plot is created by selecting any one of the above-mentioned input variables as the independent variable and dependent variable. Therefore, 36 graphs are created with precipitation (Rain), temperature (Temprt), wind speed (Wnd_spd), humidity (Humdt), cloud amount (Cloud), and solar power generation (Pow) set as independent and dependent variables, respectively. .

도 4에 도시된 페어 플롯은, 독립 변수와 종속 변수간의 상관관계를 시각적으로 표현할 수 있다. 즉, 제2 그래프는 입력 변수들 간의 상관관계를 나타내는 그래프이다. 구체적으로, 페어 플롯 그래프의 형태가 y=x 그래프의 형태와 유사할수록, 독립 변수와 종속 변수간의 강한 양의 상관관계 존재한다. 하지만, 페어 플롯 그래프의 형태가 y=-x 그래프의 형태와 유사할수록, 독립 변수와 종속 변수가 강한 음의 상관관계를 가진다. The pair plot shown in FIG. 4 can visually express the correlation between independent variables and dependent variables. That is, the second graph is a graph representing the correlation between input variables. Specifically, the more similar the shape of the pair plot graph is to the shape of the y=x graph, the stronger the positive correlation between the independent variable and the dependent variable. However, the more similar the shape of the pair plot graph is to the shape of the y=-x graph, the stronger the negative correlation between the independent variable and the dependent variable.

또한, 도 4에 도시된 바와 같이 실시예에 따른 제2 그래프는, 특정 구간 데이터의 상관관계를 확인하기 위해, 브러싱 기능을 제공한다. 브러싱된 데이터 구간은 파랑색으로 표시되며, 특정 구간 데이터의 상관관계를 색상을 이용하여 즉각적으로 파악할 수 있다. Additionally, as shown in FIG. 4, the second graph according to the embodiment provides a brushing function to check the correlation of data in a specific section. The brushed data section is displayed in blue, and the correlation between data in a specific section can be immediately identified using color.

따라서, 제2 그래프를 이용하여, 입력 변수 중 어느 하나를 타겟 변수로 선택하고, 타겟 변수와 상관관계가 높은 입력 변수를 파악할 수 있다. 더불어, 강한 상관관계를 가지는 입력 변수를 선택하여 발전량 예측 장치(1)의 학습을 수행할 수 있다. Therefore, using the second graph, one of the input variables can be selected as the target variable and the input variable with a high correlation with the target variable can be identified. In addition, learning of the power generation prediction device 1 can be performed by selecting input variables that have a strong correlation.

타겟 변수가 아닌 두 입력 변수들 간에 강한 상관관계가 존재하는 경우, 학습 과정에서 다중공선성의 문제가 발생할 수 있다. 다중공선성이란 예측 모델의 학습을 위한 입력 변수들 간에 독립적인 관계가 아닌 서로 강한 상관관계가 존재하는 경우를 의미한다. If there is a strong correlation between two input variables that are not target variables, multicollinearity problems may occur during the learning process. Multicollinearity refers to a case where there is a strong correlation between input variables for learning a prediction model, rather than an independent relationship.

따라서, 데이터에 다중공선성이 존재하는 경우, 예측 모델의 계수 추정이 불가하거나, 데이터의 미세한 변화에 따라 예측 모델의 예측 결과가 크게 달라질 수 있다. 그러므로, 입력 변수 중 어느 하나를 타겟 변수로 선택하고, 타겟 변수와 높은 상관관계가 존재하는 다른 하나의 입력 변수에 대한 학습을 수행함으로써, 다중공선성의 발생을 감소시킬 수 있다. Therefore, if multicollinearity exists in the data, it may not be possible to estimate the coefficients of the prediction model, or the prediction results of the prediction model may vary greatly depending on subtle changes in the data. Therefore, the occurrence of multicollinearity can be reduced by selecting one of the input variables as the target variable and performing learning on the other input variable that has a high correlation with the target variable.

제1 그래프를 생성하기 위해, 제2 그래프 중 그래프의 형태가 y=x 그래프 또는 y=-x 그래프의 형태와 유사한 그래프를 제3 그래프로 선택할 수 있다. 그리고, 제3 그래프에 대응하는 입력 변수를 이용하여 제1 그래프를 생성할 수 있다. 또한, 제2 그래프의 형태를 이용하여 독립 변수와 종속 변수간의 상관도를 도출하고, 상관도가 미리 설정된 기준값이 상인 그래프를 제3 그래프로 설정할 수 있다. 그리고, 제3 그래프에 대응하는 입력 변수를 이용하여 제1 그래프를 생성할 수 있다.To create the first graph, a graph whose shape is similar to that of the y=x graph or the y=-x graph among the second graphs may be selected as the third graph. Then, the first graph can be created using the input variable corresponding to the third graph. Additionally, the degree of correlation between the independent variable and the dependent variable can be derived using the form of the second graph, and a graph whose degree of correlation is above a preset reference value can be set as the third graph. Then, the first graph can be created using the input variable corresponding to the third graph.

도 5는 타겟 변수 그래프의 예시로서, 시계열적 변화에 따른 태양광 발전량 그래프이다. 타겟 변수가 태양광 발전량으로 선택된 경우, 도 5와 같이 태양광 발전량의 라인 차트를 생성 할 수 있다. Figure 5 is an example of a target variable graph, which is a graph of solar power generation according to time series changes. If the target variable is selected as solar power generation, a line chart of solar power generation can be created as shown in FIG. 5.

구체적으로, 도 5의 (a)는 시간(HOUR)단위, 도 5의 (b)는 년(year)단위, 도 5의 (c)는 계절(season) 단위, 도 5의 (d)는 월(month) 단위로 태양광 발전량을 도시한다. Specifically, (a) in FIG. 5 is a time unit (HOUR), (b) in FIG. 5 is a unit of year, (c) in FIG. 5 is a unit of season, and (d) in FIG. 5 is a unit of month. Shows solar power generation in (month) units.

하지만, 실시예가 이에 한정되는 것은 아니며, 입력 변수에 따라 효과적으로 데이터를 시각화 할 수 있는 다양한 그래프 및 방법을 이용하여 표현될 수 있다.However, the embodiment is not limited to this, and can be expressed using various graphs and methods that can effectively visualize data depending on input variables.

도 6은 제1 그래프의 예시도이다. Figure 6 is an example diagram of the first graph.

도 6는 타겟 변수가 태양광 발전량인 경우, 풍속, 습도, 태양 복사(solar radiation), 지면 온도(Ground temperature), 구름량에 관한 제1 그래프가 각각 생성된 경우이다. 즉, 제 2 그래프 중 태양광 발전량이 독립 변수이고, 풍속, 습도, 태양 복사(solar radiation), 지면 온도(Ground temperature), 구름량을 종속 변수로 하는 그래프가 제3 그래프로 선택되고, 풍속, 습도, 태양 복사(solar radiation), 지면 온도(Ground temperature), 구름량에 대한 제1 그래프가 생성된 예시이다. Figure 6 shows a case where the first graph for wind speed, humidity, solar radiation, ground temperature, and cloud cover is generated when the target variable is solar power generation. That is, among the second graphs, solar power generation is the independent variable, and a graph with wind speed, humidity, solar radiation, ground temperature, and cloud cover as dependent variables is selected as the third graph, and wind speed, This is an example in which the first graph for humidity, solar radiation, ground temperature, and cloud cover was created.

제1 그래프는, 입력 변수 데이터의 특징을 도출하기 위한 것으로서, 제1 그래프의 형태적 특징을 이용하여 입력 데이터를 복수의 특징 데이터 셋으로 분류할 수 있다. The first graph is used to derive features of input variable data, and the input data can be classified into a plurality of feature data sets using the morphological features of the first graph.

이하, 도 7을 참조하여 실시예에 따른 특징 데이터 셋 분류 방법을 설명한다. Hereinafter, a feature data set classification method according to an embodiment will be described with reference to FIG. 7.

도 7은 실시예에 따른 특징 데이터 셋 분류의 예시도이다. Figure 7 is an example diagram of feature data set classification according to an embodiment.

도 7의 (a) 및 도 7의 (b)는 상술한 복수의 제1 그래프 중 지면 온도에 관한 입력 변수 데이터를 복수의 특징 데이터 셋으로 분류한 데이터 분할 그룹을 나타낸다. 즉, 지면 온도에 관한 입력 변수 데이터를 4개의 그룹(Group0, Group1, Group2, Group3)으로 분할하는 것을 나타낸다. 각각의 그룹의 구분을 시각적으로 명확히 표현하기 위해 각각의 그룹은 색을 달리하여 표시될 수 있다. Figures 7(a) and 7(b) show a data division group in which input variable data related to ground temperature among the plurality of first graphs described above is classified into a plurality of feature data sets. In other words, it indicates dividing input variable data about ground temperature into four groups (Group0, Group1, Group2, Group3). To visually clearly express the division of each group, each group may be displayed in a different color.

구체적으로, 도 6에 도시된 지면 온도에 관한 제1 그래프를 선택하는 경우, 제1 그래프의, 최대값, 최소값, 변곡점의 위치, 변곡점의 개수, 기울기, 분포도, 평균값 등과 같은 형태적 특징을 하나 이상 이용하여 지면 온도 데이터를 네 개의 특징 데이터 셋으로 분류한다. Specifically, when selecting the first graph regarding the ground temperature shown in FIG. 6, one morphological feature such as the maximum value, minimum value, location of the inflection point, number of inflection points, slope, distribution, average value, etc. of the first graph is selected. Using the above, the ground temperature data is classified into four feature data sets.

도 7의 (c)는 선택한 입력 변수인 지면 온도의 라인차트를 나타내며, 도 7의 (d)는 타겟 변수인 태양광 발전량의 라인 차트를 나타낸다. 따라서, 입력 변수가 그래프의 형태적 특징에 따라 복수의 특징 데이터 셋으로 분류되는 것을 시각적으로 표현할 수 있다.Figure 7(c) shows a line chart of ground temperature, which is the selected input variable, and Figure 7(d) shows a line chart of solar power generation amount, which is a target variable. Therefore, it is possible to visually express that the input variable is classified into a plurality of feature data sets according to the morphological characteristics of the graph.

이 때, 도 7에는 입력 변수를 지면 온도를 선택하여 특징 데이터 셋을 분류한 예시만 도시되었지만, 본 발명의 실시예가 이에 한정되는 것은 아니며 입력 변수는 모두 선택될 수 있다. 또한, 입력 변수 데이터를 4개의 그룹으로 분류한 예시가 도시되어 있으나, 본 발명의 실시예가 이에 한정되는 것은 아니며 제1 그래프의 형태적 특성에 따라 분류되는 그룹 및 특징 데이터 셋의 개수는 변화할 수 있다.At this time, Figure 7 shows only an example of classifying the feature data set by selecting ground temperature as the input variable, but the embodiment of the present invention is not limited to this and all input variables can be selected. In addition, an example of classifying input variable data into four groups is shown, but the embodiment of the present invention is not limited to this, and the number of groups and feature data sets classified according to the morphological characteristics of the first graph may vary. there is.

도 8은 실시예에 따른 특징 데이터 셋 분류의 예시도이다. Figure 8 is an example diagram of feature data set classification according to an embodiment.

도 8은 입력 변수가 각각 기상 패턴, 계절기후, 발전량인 경우 특징 데이터 셋의 예시를 나타낸다. Figure 8 shows an example of a feature data set when the input variables are weather patterns, seasonal climate, and power generation, respectively.

입력 변수가 기상 패턴인 경우, 특징 데이터 셋은 맑음(SUNNY DATA), 강수(RAINY DATA), 흐림(CLOUDY DATA), 강풍(WINDY DATA)로 분류될 수 있다. 또한, 입력 변수가 계절 기후인 경우, 특징 데이터 셋은 봄(SPRING DATA), 여름(SUMMER DATA), 가을(AUTUMN DATA), 겨울(WINTER DATA)로 분류될 수 있다. 입력 변수가 발전량인 경우, 특징 데이터 셋은 저전력(LOW-POWER DATA), 고전력(HIGH-POWER DATA)로 구분될 수 있다. If the input variable is a weather pattern, the feature data set can be classified as SUNNY DATA, RAINY DATA, CLOUDY DATA, and WINDY DATA. Additionally, if the input variable is seasonal climate, the feature data set can be classified into SPRING DATA, SUMMER DATA, AUTUMN DATA, and WINTER DATA. When the input variable is power generation, the feature data set can be divided into low-power (LOW-POWER DATA) and high-power (HIGH-POWER DATA).

이하, 도 9 및 도 10을 참조하여 실시예에 따른 앙상블 모델의 구조를 설명한다. Hereinafter, the structure of the ensemble model according to the embodiment will be described with reference to FIGS. 9 and 10.

도 9는 실시예에 따른 앙상블 모델의 예시도이다. Figure 9 is an example diagram of an ensemble model according to an embodiment.

도 9를 참조하면, 앙상블 모델은 복수의 입력 변수 데이터 각각에 대하여 병렬적으로 학습을 수행하고, 각각의 입력 변수 데이터를 이용하여 발전량 예측 결과를 도출하는 학습 모델을 의미한다. Referring to FIG. 9, the ensemble model refers to a learning model that performs learning in parallel on each of a plurality of input variable data and derives a power generation prediction result using each input variable data.

예를 들어, 서브 데이터 셋 1(Sub-dataset 1)은 도 8에 도시된 바와 같이, 기상패턴에 따라 분리된 특징 데이터 셋인, 맑음(SUNNY DATA), 강수(RAINY DATA), 흐림(CLOUDY DATA), 강풍(WINDY DATA)에 대응될 수 있다. For example, sub-dataset 1 is a feature data set separated according to weather patterns, SUNNY DATA, RAINY DATA, and CLOUDY DATA, as shown in FIG. 8. , can respond to strong winds (WINDY DATA).

서브 데이터 셋 1은 앙상블 모델의 하위 모델인 서브 모델 1(Sub-model 1)에 입력되며, 서브 모델 1은 기상패턴에 따라 분리된 특징 데이터 셋을 이용하여 학습을 수행한다. 그리고, 기상패턴에 따라 분리된 특징 데이터 셋을 이용한 발전량 예측 결과(Prediction 1)을 도출한다. Sub-data set 1 is input to Sub-model 1, a sub-model of the ensemble model, and Sub-model 1 performs learning using feature data sets separated according to weather patterns. Then, the power generation prediction result (Prediction 1) is derived using a feature data set separated according to weather patterns.

또한, 서브 데이터 셋 2(Sub-dataset 2)는 입력 변수가 계절 기후인 경우, 특징 데이터 셋인 봄(SPRING DATA), 여름(SUMMER DATA), 가을(AUTUMN DATA), 겨울(WINTER DATA)에 대응될 수 있다. In addition, if the input variable is seasonal climate, sub-dataset 2 will correspond to the characteristic data sets of SPRING DATA, SUMMER DATA, AUTUMN DATA, and WINTER DATA. You can.

서브 데이터 셋 2는 앙상블 모델의 하위 모델인 서브 모델 2(Sub-model 2)에 입력되며, 서브 모델 2는 입력 변수가 계절 기후인 경우 분류된 특징 데이터 셋을 이용하여 학습을 수행한다. 그리고, 계절 기후에 따라 분리된 특징 데이터 셋을 이용한 발전량 예측 결과(Prediction 2)을 도출한다. Sub-data set 2 is input to Sub-model 2, a sub-model of the ensemble model, and sub-model 2 performs learning using the classified feature data set when the input variable is seasonal climate. In addition, the power generation prediction results (Prediction 2) are derived using feature data sets separated according to seasonal climate.

서브 데이터 셋 3(Sub-dataset 3)은 입력 변수가 발전량인 경우, 특징 데이터 셋인 저전력(LOW-POWER DATA), 고전력(HIGH-POWER DATA)에 대응될 수 있다.When the input variable is power generation, sub-dataset 3 may correspond to the characteristic data sets LOW-POWER DATA and HIGH-POWER DATA.

서브 데이터 셋 3은 앙상블 모델의 하위 모델인 서브 모델 3(Sub-model 3)에 입력되며, 서브 모델 3은 입력 변수가 발전량인 경우 분류된 특징 데이터 셋을 이용하여 학습을 수행한다. 그리고, 발전량에 따라 분리된 특징 데이터 셋을 이용한 발전량 예측 결과(Prediction 3)을 도출한다. Sub-data set 3 is input to Sub-model 3, a sub-model of the ensemble model, and sub-model 3 performs learning using the classified feature data set when the input variable is power generation. Then, the power generation prediction result (Prediction 3) is derived using a feature data set separated according to the power generation amount.

더불어, 앙상블 모델은 N개의 입력 변수에 대한 학습을 수행하기 위해, N개의 하위 모델이 병렬적으로 구비되는 구조를 가질 수 있다. 따라서, 각각의 하위 모델은 분류된 특징 데이터 셋을 이용하여 입력 데이터에 대한 학습을 수행하고, 각 특징 데이터 셋에 대한 발전량 예측 결과를 도출한다. In addition, the ensemble model may have a structure in which N sub-models are provided in parallel to perform learning on N input variables. Therefore, each sub-model performs learning on the input data using the classified feature data set and derives power generation prediction results for each feature data set.

따라서, 실시예에 따른 발전량 예측 방법 및 장치는 입력 변수를 분리하여 학습을 수행하므로, 입력 변수 간의 다중공선성이 발생하지 않도록 학습 및 발전량 예측을 수행한다. Therefore, the method and device for predicting power generation according to the embodiment perform learning by separating input variables, and thus perform learning and power generation prediction to prevent multicollinearity between input variables.

각각의 하위 모델(Sub-model 1, 2, 3, ... , N)에서 도출된 발전량 예측 결과(Prediction 1, 2, 3, ... , N)는 메모리(300) 및/또는 데이터베이스(200)에 저장될 수 있다. 그리고, 프로세서(400)는 발전량 예측 프로그램을 실행하여 각각의 예측 결과(Prediction 1, 2, 3, ... , N)에 대한 손실율(loss) 또는 예측 정확도를 도출할 수 있다. The power generation prediction results (Prediction 1, 2, 3, ..., N) derived from each sub-model (Sub-model 1, 2, 3, ..., N) are stored in memory 300 and/or database ( 200). Then, the processor 400 can execute the power generation prediction program to derive the loss rate or prediction accuracy for each prediction result (Prediction 1, 2, 3, ..., N).

손실율 또는 에러(Error Rate)는 편향 분산 트레이드오프(bias-variance tradeoff)을 이용하여 도출할 수 있으며, 손실율 또는 에러가 가장 낮은 예측 결과 또는 예측 정확도가 가장 높은 예측 결과를 최종 발전량 예측 결과로 도출할 수 있다. The loss rate or error rate can be derived using a bias-variance tradeoff, and the prediction result with the lowest loss rate or error or the prediction result with the highest prediction accuracy can be derived as the final power generation prediction result. You can.

도 10은 실시예에 따른 앙상블 모델의 데이터 처리 개념도이다.Figure 10 is a conceptual diagram of data processing of an ensemble model according to an embodiment.

도 10을 참조하면, 도 10의 (a)는 전처리 단계를 나타낸다. 따라서, 앙상블 모델의 하위 구조로 입력된 특징 데이터 셋은 전처리 단계를 거친다. 전처리 단계는 특징 데이터 셋에 대하여, 결측값 처리, 여-존슨 변화(Yeo-Johnson Transformation), 최초값-최대값 스케일링(min-max scaling) 중 어느 하나 이상을 수행할 수 있다. Referring to FIG. 10, (a) of FIG. 10 shows a pre-processing step. Therefore, the feature data set input as a substructure of the ensemble model goes through a preprocessing step. The preprocessing step may perform one or more of missing value processing, Yeo-Johnson Transformation, and min-max scaling on the feature data set.

도 10의 (b)는 학습 및 결과 예측을 위한 레이어(layer) 구조를 나타낸다. 전처리 단계를 거친 특징 데이터 셋은 레이어 구조를 이용하여 학습 및 발전량 예측을 수행한다. 레이어는 RNN(Recurrent Neural Networks), LSTM(Long Short Term Memory), DENSE 구조 중 어느 하나 이상을 포함할 수 있다. Figure 10(b) shows the layer structure for learning and result prediction. The feature data set that has gone through the preprocessing stage is used for learning and power generation prediction using a layer structure. The layer may include one or more of Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), and DENSE structures.

또한, 도 10의 (c)는 하이퍼파라미터 설정을 도시화하여 나타낸다. 하이퍼파라미터 설정 단계에서는, 형상(shape), 유닛(units), 활성화 함수(activation functions), 편향(biasd) 등을 설정할 수 있다. 이 때, 일반적으로 활성화 함수(activation functions)에는 tanh 또는 ReLu이 사용될 수 있다. Additionally, Figure 10(c) illustrates the hyperparameter settings. In the hyperparameter setting stage, shape, units, activation functions, bias, etc. can be set. At this time, tanh or ReLu can generally be used for activation functions.

도 10의 (d)는 레이어 구조를 시각화하여 나타낸다. 따라서, 전처리 단계(Preprocessing)에서는 앙상블 모델의 하위 구조로 입력된 특징 데이터 셋에 대한 전처리를 수행한다. 전처리된 특징 데이터 셋은, 레이어 계층(LSTM, DENSE)를 거쳐 학습 및 발전량 예측에 사용된다. Figure 10(d) visualizes the layer structure. Therefore, in the preprocessing step, preprocessing is performed on the feature data set input as a substructure of the ensemble model. The preprocessed feature data set is used for learning and power generation prediction through layer layers (LSTM, DENSE).

그리고, 각각의 하위 모델에서 도출된 발전량 예측 값은 메모리(300) 및/또는 데이터베이스(200)에 저장될 수 있다. 또한, 프로세서(400)는 편향 분산 트레이드오프(bias-variance tradeoff)을 이용하여 각각의 예측 결과(Prediction 1, 2, 3, ... , N)에 대한 손실율(loss) 또는 에러(Error Rate) 를 도출할 수 있다. 손실율 또는 에러가 가장 낮은 예측 결과 또는 예측 정확도가 가장 높은 예측 결과를 최종 발전량 예측 결과로 도출할 수 있다. And, the power generation prediction value derived from each sub-model may be stored in the memory 300 and/or the database 200. In addition, the processor 400 uses a bias-variance tradeoff to determine the loss or error rate for each prediction result (Prediction 1, 2, 3, ..., N). can be derived. The prediction result with the lowest loss rate or error or the prediction result with the highest prediction accuracy can be derived as the final power generation prediction result.

이하, 도 11 내지 도 19를 참조하여 실시예에 따른 제4 그래프 생성 방법을 설명한다. Hereinafter, a fourth graph generation method according to an embodiment will be described with reference to FIGS. 11 to 19.

도 11은 실시예에 따른 발전량 예측 방법의 흐름도이다.Figure 11 is a flowchart of a method for predicting power generation according to an embodiment.

도 11을 참조하면, 실싱예에 따른 발전량 예측 방법은 제4 그래프 생성 단계(S400)를 더 포함할 수 있다. 제 4 그래프는 입력 변수로부터 발전량 예측 결과를 도출하는 과정을 도시적으로 표현하기 위한 그래프를 의미한다. 즉, 제4 그래프는 입력 변수 데이터와 발전량 예측 결과간의 인과관계를 나타내는 그래프를 의미한다. Referring to FIG. 11, the method for predicting power generation according to an embodiment may further include a fourth graph generating step (S400). The fourth graph refers to a graph to graphically express the process of deriving power generation prediction results from input variables. In other words, the fourth graph refers to a graph representing the causal relationship between input variable data and power generation prediction results.

도 12는, 실시예에 따른 제4 그래프 생성 방법의 흐름도이다.Figure 12 is a flowchart of a fourth graph generation method according to an embodiment.

도 12를 참조하면, 입력 변수와 예측 결과간 기여도 도출 단계(S410)에서 기여도를 도출한다. 그리고, 제 5그래프 생성 단계(S420)에서, 입력 변수와 예측 결과간 기여도를 이용하여 제5 그래프를 생성한다.Referring to FIG. 12, the contribution is derived in the contribution derivation step (S410) between the input variable and the prediction result. Then, in the fifth graph creation step (S420), the fifth graph is created using the contribution between the input variable and the prediction result.

구체적으로, 입력 변수와 예측 결과간 기여도는 SHAP(shapley additive explanations) 밸류(Value)를 이용하여 도출할 수 있다. SHAP 밸류를 이용한 제 5 그래프는 도 13, 도 14 및 도 15와 같이 도출될 수 있다. Specifically, the contribution between input variables and prediction results can be derived using SHAP (shapley additive explanations) value. A fifth graph using SHAP value can be derived as shown in FIGS. 13, 14, and 15.

도 13은 SHAP 서머리 플롯(Summary plot)의 예시도를 나타낸다. 도 13의 x축은 입력 변수의 기여도를 나타내는 SHAP 값을 의미한다. y축은 입력 변수를 나타내며, 입력 변수의 순서는 기여도에 따라 내림차순으로 정렬된다. Figure 13 shows an example SHAP summary plot. The x-axis in Figure 13 represents the SHAP value indicating the contribution of the input variable. The y-axis represents the input variables, and the order of the input variables is sorted in descending order according to their contribution.

또한, 도 13의 각 점은 입력 변수 내 인스턴스이며, 점의 색상은 인스턴스 값의 크기를 나타낸다. 따라서, 붉은색 점일수록 높은 값에 해당하며, 푸른색 점일수록 낮은 값의 인스턴스에 해당한다. 더불어, 점들이 뭉쳐있는 것은, SHAP 값의 밀집된 분포를 보여준다. Additionally, each dot in Figure 13 is an instance within an input variable, and the color of the dot indicates the size of the instance value. Therefore, red dots correspond to higher values, and blue dots correspond to lower value instances. Additionally, the clustering of dots shows a dense distribution of SHAP values.

따라서, 도 13은 복수의 입력 변수 중 온도(temprt), 습도(humdt), 구름량(cloud), 강수량(raisn)에 대한 SHAP 서머리 플롯이다. 그리고, 최상단에 위치한 온도가 발전량에 가장 기여도가 크며, 강수량이 가장 낮은 기여도를 가진다. Accordingly, Figure 13 is a SHAP summary plot for temperature (temprt), humidity (humdt), cloud amount (cloud), and precipitation (raisn) among a plurality of input variables. Additionally, the temperature at the top has the greatest contribution to power generation, and precipitation has the lowest contribution.

도 14는 결정 플롯(Decision plot)의 예시도를 나타낸다. 결정 플롯은 예측까지 도달하는 경로를 도시한다. 경로는 그래프의 아래에서 위를 향하는 방향으로 진행된다. 이 때, 기여도는 선의 이동 방향과 거리로 표현된다. 그래프의 중앙에서 오른쪽 방향은 양의 기여도, 왼쪽 방향은 음의 기여도를 의미한다. Figure 14 shows an example of a decision plot. The decision plot shows the path leading to the prediction. The path proceeds from the bottom to the top of the graph. At this time, the contribution is expressed in the direction of movement of the line and the distance. The direction to the right from the center of the graph indicates positive contribution, and the direction to the left indicates negative contribution.

이동 거리는 기여도의 크기를 나타낸다. 또한, 선의 색상은 모든 인스턴스의 예측 결과들 내에서 해당 인스턴스의 분위수에 따라 지정된다. 따라서, 선의 이동 거리가 가장 긴 온도(temprt)의 기여도가 가장 크며, 선의 이동 거리가 가장 짧은 강수량(rain)이 가장 낮은 기여도를 가진다. The distance traveled indicates the size of the contribution. Additionally, the color of the line is specified according to the quantile of the instance within the prediction results of all instances. Therefore, the temperature (temprt) with the longest moving distance of the line has the largest contribution, and the precipitation (rain) with the shortest moving distance of the line has the lowest contribution.

도 15는, 클러스터링 SHAP 플롯(Clustering SHAP plot)의 예시도를 나타낸다. 클러스터링 SHAP 플롯은, 전체 입력 데이터셋에 대한 예측 결과의 변화량을 나타낸다. Figure 15 shows an example of a clustering SHAP plot. The clustering SHAP plot shows the amount of change in prediction results for the entire input dataset.

x축은 데이터의 인트턴스를 나타내며, 인스턴스는 예측 결과가 높은 순으로 정렬된다. y축은 예측 결과와 기여도의 크기를 나타낸다. 각 인스턴스의 예측 결과는 붉은 색과 푸른색이 접해있는 지점에 해당한다. 붉은 색은 양의 기여도를 의미하고, 푸른 색은 음의 기여도를 나타낸다. 따라서, 인스턴스의 모든 입력 변수들의 기여도 합이 양인 경우, 기준 값보다 예측 결과가 높게 나타난다. The x-axis represents the instance of the data, and instances are sorted in descending order of prediction results. The y-axis represents the prediction result and the size of contribution. The prediction result of each instance corresponds to the point where red and blue meet. Red indicates positive contribution, and blue indicates negative contribution. Therefore, if the sum of contributions of all input variables of an instance is positive, the prediction result appears higher than the reference value.

또한, 도 12에 도시된 바와 같이, 실시예에 따른 발전량 예측 방법은, 기여도에 따른 제5그래프를 생성하는 것뿐만 아니라, 입력 변수와 발전량 예측 결과간의 의존도를 이용한 제6 그래프를 생성할 수 있다. In addition, as shown in FIG. 12, the power generation prediction method according to the embodiment not only generates a fifth graph based on contribution, but also creates a sixth graph using the dependence between input variables and power generation prediction results. .

구체적으로, 입력 변수와 예측 결과간 의존도 도출 단계(S430)에서는, 다른 모든 입력 변수들(marginal feature set)의 조건이 동일하다고 가정한 후, 한가지 입력 변수(selected feature set)의 변화가 예측 결과에 미치는 한계 효과(Marginal effect)로부터 의존도를 도출한다. 그리고, 제6 그래프 도출 단계(S440)에서는 입력 변수와 발전량 예측 결과 사이의 의존도를 이용한 제6 그래프를 생성한다. 제6 그래프는, 도 16 과 같이 도출될 수 있다. Specifically, in the dependency derivation step (S430) between input variables and prediction results, after assuming that the conditions of all other input variables (marginal feature set) are the same, a change in one input variable (selected feature set) affects the prediction result. Dependency is derived from the marginal effect. And, in the sixth graph derivation step (S440), a sixth graph is generated using the dependence between the input variables and the power generation prediction result. The sixth graph can be derived as shown in FIG. 16.

도 16은, ICE(Individual Conditional Expectation) 플롯의 예시도를 나타낸다. ICE 플롯은 입력 변수의 값이 변할 때 발전량 예측 결과 값의 변화를 모든 인스턴스에 대하여 도시한다. 따라서, 도 16에는 입력 변수가 발전량(power), 풍속(wind), 습도(Hymidity), 조도(Irradiance), 온도(Temperature)인 경우의 ICE 플롯의 예시이다. Figure 16 shows an example of an ICE (Individual Conditional Expectation) plot. The ICE plot shows the change in power generation prediction result for all instances when the value of the input variable changes. Therefore, Figure 16 is an example of an ICE plot when the input variables are power, wind, humidity, irradiance, and temperature.

ICE 플롯에서 의존도는 입력 변수 값과 예측 결과 사이의 변화 관계를 의미한다. 따라서, 인스턴스 각각의 의존도에 대응하는 선의 이용하여 의존도를 도시한다. In an ICE plot, dependency refers to the change relationship between input variable values and predicted results. Therefore, the dependencies are shown using lines corresponding to the dependencies of each instance.

굵은 선은 모든 인스턴스의 평균 의존도를 나타낸다. 평균 의존도는 PDP(Partial Dependence Plot)와 동일한 결과를 나타낸다. 또한, ICE 플롯에서 x축은 입력 변수들의 값을 나타내고, y축은 발전량 예측 결과 값을 나타낸다. 더불어, 다수의 모델들의 의존도를 색상으로 구분하여 하나의 ICE 플롯에 표시할 수 있다.The bold line represents the average dependence of all instances. Average dependence shows the same results as PDP (Partial Dependence Plot). Additionally, in the ICE plot, the x-axis represents the values of input variables, and the y-axis represents the power generation prediction result. In addition, the dependencies of multiple models can be color-coded and displayed on a single ICE plot.

도 17은 내지 도 19는 실시예에 따른 발전량 예측 결과의 예시도이다.Figures 17 to 19 are exemplary diagrams of power generation prediction results according to an embodiment.

도 17은 실제 예측해야하는 발전량 값과 실시예에 따른 발전량 예측 장치 및 방법을 이용하여 예측한 발전량 예측 값을 도시한 그래프이다. x축은 시간축(epoch)을 나타내며, y축은 태양광 발전량을 나타낸다. Figure 17 is a graph showing the actual power generation value to be predicted and the power generation prediction value predicted using the power generation prediction device and method according to the embodiment. The x-axis represents the time axis (epoch), and the y-axis represents solar power generation.

L1은 모든 입력 변수를 이용하여 예측한 태양광 발전량 예측 값을 나타낸다. L2는 실제 예측해야 하는 태양광 발전량 값을 나타낸다. L3 및 L4는 하이퍼파라미터 설정을 변경하여 예측한 태양광 발전량 예측 값을 나타낸다. L1 represents the predicted solar power generation amount predicted using all input variables. L2 represents the solar power generation value that should actually be predicted. L3 and L4 represent the solar power generation predicted values predicted by changing the hyperparameter settings.

도 18 및 도 19는 편향 분산 트레이드오프(bias-variance tradeoff)를 확인하기 위한 그래프이다. x축은 에러(error, bias 혹은 variance)를 나타내며, y축은 손실(loss)을 나타낸다. L5는 편향(Bias), L6는 분산(variance)를 나타낸다. 시간(epoch)에 따른 오버피팅(overfitting), 언더피팅(underfitting)여부를 분석하여 편향 분산 트레이드오프를 확인할 수 있다. Figures 18 and 19 are graphs for checking bias-variance tradeoff. The x-axis represents error, bias, or variance, and the y-axis represents loss. L5 represents bias and L6 represents variance. You can check the bias-dispersion trade-off by analyzing overfitting and underfitting according to time (epoch).

또한, 전체 에러는 아래의 수학식 1과 같이 정의될 수 있다. Additionally, the total error can be defined as Equation 1 below.

[수학식 1][Equation 1]

Total error = bias² + Variance + Irreducible ErrorTotal error = bias ² + Variance + Irreducible Error

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. One embodiment of the present invention may also be implemented in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include computer storage media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with respect to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as single may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

1: 발전량 예측 장치 100: 발전기
200: 데이터베이스 300: 메모리
400: 프로세서 500: 단말기 1: Power generation prediction device 100: Generator
200: database 300: memory
400: processor 500: terminal

Claims

Generating a first graph representing the correlation between any one selection variable data among a plurality of input variable data and a plurality of other input variable data;
Classifying the input variable data into a plurality of feature data sets based on the shape of the first graph, and
Predicting power generation by inputting the input variable data into a power generation prediction model learned using the plurality of feature data sets.
Including, a method for predicting power generation.

According to paragraph 1,
The step of creating a graph for the input variable data is.
Generating a second graph using one or more of the plurality of input variable data as an independent variable and/or a dependent variable,
Deriving a correlation between the independent variable and the dependent variable using the second graph,
Selecting a third graph whose correlation is greater than or equal to a preset reference value, and
Generating the first graph using one or more of the input variables corresponding to the third graph.
Including, a method for predicting power generation.

According to paragraph 2,
The step of classifying the input variable data into a plurality of feature data sets is:
Classifying the input variable data into the feature data set using one or more of the maximum value, minimum value, location of the inflection point, number of inflection points, slope, distribution, and average value of the first graph.
Including, a method for predicting power generation.

According to paragraph 3,
The step of predicting the power generation amount is,
Creating a fourth graph in which one or more of the plurality of input variable data is set as an independent variable and the power generation prediction result is set as a dependent variable.
Including, a method for predicting power generation.

According to paragraph 4,
The step of generating the fourth graph is,
Deriving a contribution between the input variable data and the power generation prediction result and generating a fifth graph using the contribution.
Including, a method for predicting power generation.

According to clause 5,
The step of generating the fourth graph is,
Deriving a dependency between the input variable data and the power generation prediction result and generating a sixth graph using the dependency.
A method for predicting power generation, further comprising:

According to claim 3 or 6,
The step of predicting the power generation amount is,
Learning an ensemble model using the plurality of feature data sets, and deriving a power generation prediction result using the ensemble model.
A method for predicting power generation, further comprising:

In clause 7,
The step of predicting the power generation amount is,
Performing learning and power generation prediction of the feature data set using one or more of RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory), and DENSE structures.
A method for predicting power generation, further comprising:

According to clause 8,
The step of predicting the power generation amount is,
Performing preprocessing of the feature data set by performing one or more of missing value processing, Yeo-Johnson Transformation, and min-max scaling on the feature data set. step
A method for predicting power generation, further comprising:

A memory that stores the power generation prediction program, and
It includes a processor that executes a power generation prediction program stored in the memory,
The processor executes the power generation prediction program to generate a first graph representing the correlation between any one selection variable data among the plurality of input variable data and the other plurality of input variable data, and determines the form of the first graph. A power generation prediction device that classifies the input variable data into a plurality of feature data sets as a standard, and predicts power generation by inputting the input variable data into a power generation prediction model learned using the plurality of feature data sets.

According to clause 10,
The processor executes the power generation prediction program to generate a second graph using any one or more of the plurality of input variables as an independent variable and/or a dependent variable, and uses the second graph to determine the independent variable and the dependent variable. A power generation prediction device that derives the correlation of variables, selects a third graph whose correlation is greater than or equal to a preset reference value, and generates the first graph using one or more of the input variables corresponding to the third graph. .

According to clause 11,
The processor executes the power generation prediction program and converts the input variable data into the feature data set using one or more of the maximum value, minimum value, location of the inflection point, number of inflection points, slope, distribution, and average value of the first graph. A classification and power generation prediction device.

According to clause 12,
The processor executes the power generation prediction program to generate a fourth graph in which any one or more of the plurality of input variable data is set as an independent variable and the power generation prediction result is set as a dependent variable.

According to clause 13,
The processor executes the power generation prediction program, derives a contribution between the input variable and the power generation prediction result, and generates a fifth graph using the contribution.

According to clause 14,
The processor executes the power generation prediction program, derives a dependency between the input variable and the power generation prediction result, and generates a sixth graph using the dependency.

According to claim 12 or 15,
The processor executes the power generation prediction program, learns an ensemble model using the plurality of feature data sets, and derives a power generation prediction result using the ensemble model.

According to clause 16,
The power generation prediction program is,
A power generation prediction device that includes one or more of RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory), and DENSE structures.

According to clause 17,
The processor executes the power generation prediction program and performs one or more of missing value processing, Yeo-Johnson Transformation, and min-max scaling on the feature data set. A power generation prediction device that performs preprocessing of the feature data set.