KR102333271B1

KR102333271B1 - A method for selecting the number of hidden units using correlation of hidden units in an artificial neural network model and a method for predicting hydrological climate variables using the hidden unit

Info

Publication number: KR102333271B1
Application number: KR1020200070002A
Authority: KR
Inventors: 이태삼
Original assignee: 경상국립대학교산학협력단
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2021-11-30
Also published as: WO2021251745A1

Abstract

The present invention relates to a method for selecting a number of concealment unit using the mutual information of the concealment unit in an artificial neural network model and a method for predicting a hydrological climate variable using the same. More specifically, the present invention relates to the method for selecting the number of concealment unit using the mutual information of the concealment unit in the artificial neural network model that selects the number of the concealment unit by using the mutual information (MI) of the concealment unit in the artificial neural network model, and allows the climate factor time series data to be analyzed to output the hydrological climate variable by using the artificial neural network model having the selected number of concealment unit; and the method for predicting the hydrological climate variable using the same. Therefore, the present invention is capable of having an effect wherein a calculation speed of the input data inputted to the artificial neural network model is increased thereby allowing the analysis result to be derived more quickly.

Description

{A method for selecting the number of hidden units using correlation of hidden units in an artificial neural network model and a method for predicting hydrological climate variables using the hidden unit}

본 발명은 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법에 관한 것으로, 보다 구체적으로 인공 신경망 모델 내 은닉단위의 상관성(MI, Mutual Information)을 이용하여 상기 은닉단위의 개수가 선정되고, 선정된 은닉단위의 개수를 갖는 인공 신경망 모델을 이용하여 기후인자 시계열 데이터가 분석되어 수문기후변수가 출력되도록 하는 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법에 관한 것이다.The present invention relates to a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model and a method for predicting hydroclimatic variables using the same. More specifically, the correlation of hidden units in an artificial neural network model (MI, Mutual Information) The number of hidden units is selected, and climate factor time series data is analyzed using an artificial neural network model having the selected number of hidden units to output hydrological climate variables. It relates to a selection method and a hydroclimatic variable prediction method using the same.

과거에는 주로 전문가나 문헌에 의해 지식을 얻었지만, 문제의 범위가 복잡하고 넓어짐에 따라 과거의 전통적인 방법으로 지식을 획득하기 어려워졌다. 이에 대한 대안 및 4차 산업혁명의 진행으로 인해 데이터를 분석하여 패턴과 규칙을 찾아내어 지식을 추출하려는 방법들이 제시되고 있다. In the past, knowledge was mainly obtained from experts or literature, but as the scope of the problem became more complex and expanded, it became difficult to acquire knowledge by the traditional methods of the past. As an alternative to this and the progress of the 4th industrial revolution, methods for extracting knowledge by analyzing data to find patterns and rules are being proposed.

즉, 데이터로부터 지식을 추출하는 기법으로 전통적으로는 통계기법을 넘어 최근에는 인공 신경망, 의사결정나무, 유전자알고리즘, 사례추론시스템, 퍼지시스템 등의 인공지능(Artificial Intelligent, AI) 기법 등이 사용되고 있다.In other words, as a technique for extracting knowledge from data, traditionally, beyond statistical techniques, artificial intelligence (AI) techniques such as artificial neural networks, decision trees, genetic algorithms, case inference systems, and fuzzy systems are being used. .

특히, 인공 신경망은 분류 및 예측 문제의 해결하기 위한 다방면의 문제영역에서 사용되고 있다. 예컨대, 인공 신경망은 부실기업 예측모형, 채권등급 평가 등 재무관련 자료들을 분석하고 활용하여 결과를 예측할 수 있다. 그리고 과거부터 현재까지의 기후 변화와 이에 따른 영향을 평가 및 분석하고, 이를 활용하여 수자원 관리, 수문 설계 및 많은 기후관련 응용분야에 사용될 수 있다.In particular, artificial neural networks are used in various problem areas to solve classification and prediction problems. For example, artificial neural networks can predict results by analyzing and utilizing financial-related data such as insolvent company prediction models and bond rating evaluations. In addition, it can be used in water resource management, hydrological design, and many climate-related applications by evaluating and analyzing climate change from the past to the present and its impact.

다만, 인공 신경망은 데이터의 잡음에 민감하지 않고 그 구조가 견고하나, 자료를 학습하는 내부과정이 복잡한 수학적 모델에 의해서 생성되기 때문에 사용자들이 결과를 이해하기 어려우며, 인공 신경망의 복잡한 구조에 의해 입력된 데이터를 분석하는데 긴 시간이 걸리는 것이 가장 큰 문제점으로 지적되고 있다.However, the artificial neural network is not sensitive to the noise of the data and has a strong structure, but it is difficult for users to understand the results because the internal process of data learning is generated by a complex mathematical model, and It is pointed out that the biggest problem is that it takes a long time to analyze the data.

관련문헌 1은 딥러닝 기반 기후 변화 예측 시스템의 동작 방법에 관한 것으로, 누락된 기상 데이터가 있는지를 감지하여 정확한 기후 변화를 예측할 수 있는 장점이 있으나, 은닉층의 수를 고려하지 않고 많은 데이터를 분해하고 분해된 데이터들을 분석하므로 신속한 분석이 어려운 단점이 있다.Related Document 1 relates to the operation method of a deep learning-based climate change prediction system. It has the advantage of being able to predict accurate climate change by detecting whether there is missing weather data, but it decomposes a lot of data without considering the number of hidden layers and Since the decomposed data is analyzed, there is a disadvantage that it is difficult to quickly analyze it.

KR 10-2020-0052806KR 10-2020-0052806

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로 종래 은닉단위의 개수를 산정하기 위하여 자료를 분할하고 다른 자료로부터 은닉단위의 개수를 산정하는 과정을 생략할 수 있도록 은닉단위의 개수와 은닉단위의 상관성(MI, Mutual Information)을 이용하여 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정되도록 하는 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법을 제공하는 것을 목적으로 한다.The present invention is to solve the above problems, and the correlation between the number of hidden units and the hidden units can be omitted so that the process of dividing data in order to calculate the number of conventional hidden units and calculating the number of hidden units from other data can be omitted. (MI, Mutual Information) to select the most appropriate number of hidden units of the artificial neural network model to provide a method for selecting the number of hidden units using the correlation of hidden units in the artificial neural network model and a hydroclimatic variable prediction method using the same aim to

또한, 본 발명의 목적은 인공 신경망의 복잡한 구조에 의해 속도가 느려지는 단점을 해소하고, 상기 인공 신경망 모델에 입력된 입력 데이터의 연산속도가 증가되어 분석결과를 보다 빠르게 도출할 수 있도록 인공 신경망 모델이 가장 적절한 은닉단위의 개수를 갖도록 하는 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법을 제공하는 것이다.In addition, an object of the present invention is to solve the disadvantage of slowing down due to the complicated structure of the artificial neural network, and to increase the operation speed of the input data input to the artificial neural network model so that the analysis result can be derived more quickly. To provide a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model to have the most appropriate number of hidden units and a method for predicting hydroclimatic variables using the same.

또한, 본 발명의 목적은 수문학적으로 유의미한 예측자료를 신속하게 출력할 수 있도록 은닉단위개수 선정단계로부터 선정된 은닉단위의 개수를 갖는 인공 신경망 모델을 이용하여 기후인자 (예컨대, PDO) 시계열 데이터가 분석되어 수문기후변수가 출력되도록 하는 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법을 제공하는 것을 목적으로 한다.In addition, it is an object of the present invention to obtain climate factor (eg, PDO) time series data using an artificial neural network model having the number of hidden units selected from the number of hidden units selection step so that hydrologically meaningful prediction data can be quickly output. An object of the present invention is to provide a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model that analyzes and outputs hydrological climate variables, and a method for predicting hydrological climate variables using the same.

상기 목적을 달성하기 위하여, 본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법은 인공 신경망 모델에 시계열 데이터가 입력되는 데이터 입력단계; 상기 시계열 데이터에 있어서, 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)이 추정되는 MI 추정단계; 상기 은닉단위의 개수별 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출되는 MI평균값 산출단계; 다항회귀분석(Polynomial Regression)을 이용하여 독립변수가 상기 은닉단위의 개수이고, 종속변수가 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값인 2차 다항식이 도출되는 다항회귀분석단계; 및 상기 2차 다항식이 은닉단위의 개수에 대해 1차 미분되고, 상기 은닉단위 간 상관성이 제거되도록 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0이 대입되어, 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정되는 은닉단위개수 선정단계;를 제공한다.In order to achieve the above object, a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model of the present invention comprises: a data input step of inputting time series data into an artificial neural network model; an MI estimation step of estimating a correlation (MI, Mutual Information) of hidden units according to the number of hidden units in the time series data; an MI average value calculating step of calculating an average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units; A polynomial regression analysis step of deriving a second polynomial in which the independent variable is the number of hidden units and the dependent variable is the average value of the correlation (MI, Mutual Information) of the hidden units using polynomial regression; and 0 is substituted for the average value of the correlation (MI, Mutual Information) of the hidden units so that the second-order polynomial is first differentiated with respect to the number of hidden units, and the correlation between the hidden units is removed. It provides; a hidden unit number selection step in which the appropriate number of hidden units is selected.

본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법에 있어서, 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은, 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가되는 것을 특징으로 한다.In the method of selecting the number of hidden units using the correlation of the hidden units in the artificial neural network model of the present invention, the average value of the correlation (MI, Mutual Information) of the hidden units increases as the number of the hidden units increases. One characteristic is strengthened to decrease the average value of the correlation (MI, Mutual Information) of the hidden unit, and when the minimum point is passed, the average value of the correlation (MI, Mutual Information) of the hidden unit is maintained or increased again.

본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법에 있어서, 상기 다항회귀분석단계는, 하기 [수학식 1]로 2차 다항식이 산출되는 것을 특징으로 한다.In the method for selecting the number of hidden units using the correlation of the hidden units in the artificial neural network model of the present invention, the polynomial regression analysis step is characterized in that a quadratic polynomial is calculated by the following [Equation 1].

[수학식 1][Equation 1]

여기서, y는 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이고, x는 상기 은닉단위의 개수,

,

는 상수이다.Here, y is the average value of the correlation (MI, Mutual Information) of the hidden units, x is the number of the hidden units,

,

is a constant.

본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법에 있어서, 상기 은닉단위개수 선정단계는, 상기 은닉단위의 개수(x)가 소수점 첫째자리에서 반올림되어 양의 정수로 도출되는 것을 특징으로 한다.In the method of selecting the number of hidden units using the correlation of hidden units in the artificial neural network model of the present invention, the step of selecting the number of hidden units is that the number (x) of the hidden units is rounded to one decimal place and derived as a positive integer. characterized in that

다음으로, 상기 목적을 달성하기 위하여, 본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법은 인공 신경망 모델에 기후인자 시계열 데이터가 입력되는 데이터 입력단계; 상기 기후인자 시계열 데이터에 있어서, 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)이 추정되는 MI 추정단계; 상기 은닉단위의 개수별 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출되는 MI평균값 산출단계; 다항회귀분석(Polynomial Regression)을 이용하여 독립변수가 상기 은닉단위의 개수이고, 종속변수가 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값인 2차 다항식이 도출되는 다항회귀분석단계; 상기 2차 다항식이 은닉단위의 개수에 대해 1차 미분되고, 상기 은닉단위 간 상관성이 제거되도록 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0이 대입되어, 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정되는 은닉단위개수 선정단계; 및 상기 은닉단위 개수 선정단계로부터 선정된 은닉단위의 개수를 갖는 인공 신경망 모델을 이용하여 상기 기후인자 시계열 데이터가 분석되어 수문기후변수가 출력되는 수문기후변수 출력단계;를 제공한다.Next, in order to achieve the above object, the method for predicting hydrological climate variables using the method for selecting the number of hidden units of the present invention comprises: a data input step of inputting climate factor time series data into an artificial neural network model; an MI estimation step of estimating a correlation (MI, Mutual Information) of hidden units according to the number of hidden units in the climate factor time series data; an MI average value calculating step of calculating an average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units; A polynomial regression analysis step of deriving a second polynomial in which the independent variable is the number of hidden units and the dependent variable is the average value of the correlation (MI, Mutual Information) of the hidden units using polynomial regression; The second polynomial is first differentiated with respect to the number of hidden units, and 0 is substituted for the average value of the correlation (MI, Mutual Information) of the hidden units so that the correlation between the hidden units is removed, so that the most appropriate of the artificial neural network model a hidden unit number selection step in which the number of hidden units is selected; and a hydrological climate variable output step in which the climate factor time series data is analyzed and hydrological climate variables are output using an artificial neural network model having the number of hidden units selected in the step of selecting the number of hidden units.

본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법에 있어서, 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은, 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가되는 것을 특징으로 한다.In the hydrological climate variable prediction method using the method for selecting the number of hidden units of the present invention, the average value of the correlation (MI, Mutual Information) of the hidden units increases as the number of hidden units increases. It is characterized in that the average value of the correlation (MI, Mutual Information) of the hidden unit is strengthened, and when the minimum point is passed, the average value of the correlation (MI, Mutual Information) of the hidden unit is maintained or increased again.

본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법에 있어서, 상기 기후인자 시계열 데이터는 10년 주기 태평양 진동을 나타내는 PDO(Pacific Decadal Oscillation) 시계열 데이터인 것을 특징으로 한다.In the hydroclimatic variable prediction method using the method for selecting the number of hidden units of the present invention, the climate factor time series data is PDO (Pacific Decadal Oscillation) time series data representing the 10-year cycle Pacific oscillation.

본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법에 있어서, 상기 다항회귀분석단계는, 하기 [수학식 2]로 2차 다항식이 산출되는 것을 특징으로 하는 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법.In the hydrological climate variable prediction method using the method for selecting the number of hidden units of the present invention, in the polynomial regression analysis step, a quadratic polynomial is calculated by the following [Equation 2]. Methods for predicting climate variables.

[수학식 2][Equation 2]

여기서, y는 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이고, x는 상기 은닉단위의 개수이다.Here, y is the average value of the correlation (MI, Mutual Information) of the hidden units, and x is the number of the hidden units.

본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법에 있어서, 상기 은닉단위개수 선정단계는, 상기 은닉단위의 개수(x)가 소수점 첫째자리에서 반올림되어 양의 정수로 도출되는 것을 특징으로 한다.In the hydrological climate variable prediction method using the method for selecting the number of hidden units of the present invention, the step of selecting the number of hidden units is derived as a positive integer by rounding the number (x) of the hidden units to the first decimal place. do.

이상과 같이 본 발명에 의하면 은닉단위의 개수와 은닉단위의 상관성(MI, Mutual Information)을 이용하여 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정되도록 함으로써, 종래 은닉단위의 개수를 산정하기 위하여 자료를 분할하고 다른 자료로부터 은닉단위의 개수를 산정하는 과정을 생략 가능한 효과가 있다.As described above, according to the present invention, the most appropriate number of hidden units of the artificial neural network model is selected using the correlation between the number of hidden units and the hidden units (Mutual Information), thereby calculating the number of conventional hidden units. There is an effect that the process of dividing the data and calculating the number of hidden units from other data can be omitted.

또한, 본 발명은 인공 신경망 모델이 가장 적절한 은닉단위의 개수를 갖도록 함으로써, 인공 신경망의 복잡한 구조에 의해 속도가 느려지는 단점을 해소하고, 상기 인공 신경망 모델에 입력된 입력 데이터의 연산속도가 증가되어 분석결과를 더욱 빠르게 도출할 수 있는 효과가 있다.In addition, the present invention solves the disadvantage of slowing down due to the complicated structure of the artificial neural network by allowing the artificial neural network model to have the most appropriate number of hidden units, and increases the operation speed of input data input to the artificial neural network model. It has the effect of deriving analysis results more quickly.

또한, 본 발명은 은닉단위개수 선정단계로부터 선정된 은닉단위의 개수를 갖는 인공 신경망 모델을 이용하여 기후인자 (예컨대, PDO) 시계열 데이터가 분석되어 수문기후변수가 출력되도록 함으로써, 수문학적으로 유의미한 예측자료를 신속하게 출력할 수 있는 효과가 있다.In addition, the present invention uses an artificial neural network model having the number of hidden units selected from the number of hidden units selection step to analyze climate factor (eg, PDO) time series data to output hydroclimate variables, thereby predicting hydrologically significant It has the effect of printing out data quickly.

도 1은 본 발명에 따른 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 흐름도이다.
도 2는 본 발명의 실시예에 따른 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법 흐름도이다.
도 3은 본 발명의 실시예에 따른 PDO 시계열 데이터를 표시한 도면이다.
도 4는 본 발명의 실시예에 따른 LSTM4의 각 은닉단위의 상태를 표시한 도면이다.
도 5는 본 발명의 실시예에 따른 LSTM10의 각 은닉단위의 상태를 표시한 도면이다.
도 6은 본 발명의 실시예에 따른 LSTM6과 LSTM14의 각 은닉단위의 상관성을 표시한 도면이다.
도 7은 본 발명의 실시예에 따른 다항회귀분석단계로부터 산출된 [수학식 2]를 표시한 그래프이다.
도 8은 본 발명의 실시예에 따른 은닉단위의 개수에 따른 RMSE와 은닉단위의 상관성을 표시한 도면이다.1 is a flowchart of a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model according to the present invention.
2 is a flowchart of a hydroclimatic variable prediction method using a method for selecting the number of hidden units according to an embodiment of the present invention.
3 is a diagram illustrating PDO time series data according to an embodiment of the present invention.
4 is a view showing the state of each hidden unit of LSTM4 according to an embodiment of the present invention.
5 is a view showing the state of each hidden unit of the LSTM10 according to an embodiment of the present invention.
6 is a diagram illustrating the correlation between each hidden unit of LSTM6 and LSTM14 according to an embodiment of the present invention.
7 is a graph showing [Equation 2] calculated from the polynomial regression analysis step according to an embodiment of the present invention.
8 is a diagram illustrating the correlation between RMSE and hidden units according to the number of hidden units according to an embodiment of the present invention.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in this specification have been selected as currently widely used general terms as possible while considering the functions in the present invention, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than the name of a simple term.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 1은 본 발명에 따른 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법 흐름도이다.Hereinafter, an embodiment according to the present invention will be described in detail with reference to the accompanying drawings. 1 is a flowchart of a method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model according to the present invention.

도 1을 보면, 본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법은 데이터 입력단계(S100), MI 추정단계(S200), MI평균값 산출단계(S300), 다항회귀분석단계(S400), 및 은닉단위개수 선정단계(S500)를 포함한다.1, the method of selecting the number of hidden units using the correlation of hidden units in the artificial neural network model of the present invention is a data input step (S100), an MI estimation step (S200), an MI average value calculation step (S300), a polynomial regression analysis step (S400), and a step of selecting the number of hidden units (S500).

보다 구체적으로 설명해보면, 상기 데이터 입력단계(S100)는 인공 신경망 모델에 시계열 데이터가 입력된다. More specifically, in the data input step ( S100 ), time series data is input to the artificial neural network model.

일반적으로 인공 신경망은 입력층, 은닉층, 출력층을 포함하고, 상기 입력층에 입력 데이터가 입력되면 상기 은닉층이 상기 입력 데이터를 분석하고 출력층에 출력데이터를 제공한다. 상기 은닉층은 다수 개의 은닉단위 또는 은닉노드가 포함될 수 있다. 본 명세서에는 은닉층 내 다수 개의 은닉단위가 포함되는 것을 전제로 작성된 것이다.In general, an artificial neural network includes an input layer, a hidden layer, and an output layer, and when input data is input to the input layer, the hidden layer analyzes the input data and provides output data to the output layer. The hidden layer may include a plurality of hidden units or hidden nodes. This specification is written on the premise that a plurality of hidden units are included in the hidden layer.

또한, 인공 신경망은 다양한 종류가 있으며, 그 중 시간이력을 고려할 수 있는 모델로 순환 신경망(RNN)이 있고, 상기 순환 신경망(RNN)은 일반적인 순환신경망과 LSTM(Long and Short Term Memory)기반의 순환 신경망이 있다. In addition, there are various types of artificial neural networks, among which there is a recurrent neural network (RNN) as a model that can consider time history. There is a neural network.

가장 바람직하게, 본 발명의 인공 신경망 모델 내 은닉단위의 상관성을 이용한 은닉단위 개수 선정방법은 장기간 시계열 데이터를 분석하기 위하여 일반 인공신경망, RNN 및 LSTM(Long and Short Term Memory) 기반의 순환 신경망이 사용될 수 있고, 이에 따라 상기 입력 데이터, 은닉단위 및 출력 데이터 모두 시계열 데이터 일 수 있다.Most preferably, in the method of selecting the number of hidden units using the correlation of hidden units in the artificial neural network model of the present invention, a general artificial neural network, RNN, and LSTM (Long and Short Term Memory)-based recurrent neural networks are used to analyze long-term time series data. Therefore, all of the input data, the hidden unit, and the output data may be time series data.

그리고 상기 인공 신경망 모델은 상기 은닉단위의 개수에 따라 LSTM1, LSTM4 또는 LSTM10으로 명명할 수 있다. 예컨대, LSTM1은 은닉층에 은닉단위 1개를 갖는 인공 신경망이고, LSTM4는 은닉층에 은닉단위 4개를 갖는 인공 신경망인 것이다.And, the artificial neural network model may be named LSTM1, LSTM4, or LSTM10 according to the number of hidden units. For example, LSTM1 is an artificial neural network having one hidden unit in the hidden layer, and LSTM4 is an artificial neural network having four hidden units in the hidden layer.

다음으로, 상기 MI 추정단계(S200)는 상기 시계열 데이터에 있어서, 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)이 추정된다.Next, in the MI estimation step (S200), in the time series data, the correlation (MI, Mutual Information) of the hidden units according to the number of the hidden units is estimated.

한편, 상기 MI 추정단계(S200)는 이변량 경험누적 분포함수를 이용하여 은닉단위의 상관성(MI, Mutual Information)이 추정된다. 여기서, 이변량 경험누적 분포함수는 경험적 척도와 관련된 누적분포함수로써, 확률표본에 대하여 통계량을 구하고 작은 값부터 순서대로 배치한 계단형 그래프로 도출될 수 있다. Meanwhile, in the MI estimation step S200, the correlation (MI, Mutual Information) of the hidden unit is estimated using the bivariate empirical cumulative distribution function. Here, the bivariate empirical cumulative distribution function is a cumulative distribution function related to an empirical scale, and it can be derived as a step-like graph in which statistics are obtained for a probability sample and arranged in order from the smallest value.

LSTM4를 이용하여 예를 들어보면, LSTM4의 은닉층에는 4개의 은닉단위가 존재한다. 각각의 은닉단위는 시계열 형태로 나타낼 수 있고, 은닉단위들 하나하나 서로 간의 은닉단위의 상관성(MI, Mutual Information)을 구할 수 있다. Using LSTM4 as an example, there are 4 hidden units in the hidden layer of LSTM4. Each hidden unit can be expressed in a time series form, and the correlation (MI, Mutual Information) of the hidden units with each other can be obtained.

이때, 상기 MI 추정단계(S200)는 4개의 은닉단위에 대해 서로 간의 은닉단위의 상관성(MI, Mutual Information)이 이변량 경험누적 분포함수를 이용하여 추정될 수 있고, 제 1 MI부터 제 n MI로 나타낼 수 있다.In this case, in the MI estimation step (S200), the correlation (MI, Mutual Information) of the four hidden units with each other can be estimated using a bivariate empirical cumulative distribution function, and from the first MI to the nth MI can be expressed as

다음으로, MI평균값 산출단계(S300)는 상기 은닉단위의 개수별 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된다. Next, in the MI average value calculation step S300, the average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units is calculated.

LSTM4를 이용하여 예를 들어보면, 상기 MI 추정단계(S200)로부터 추정된 제 1 MI부터 제 n MI를 평균하면 LSTM4 전체에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 최종적으로 산출될 수 있다. Using LSTM4 as an example, if the nth MI from the first MI estimated in the MI estimation step S200 is averaged, the average value of the correlation (MI, Mutual Information) of the hidden units for the entire LSTM4 is finally calculated. can

즉, 상기 MI평균값 산출단계(S300)는 앞서 LSTM4에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된 방식으로, LSTM1부터 LSTMn까지의 인공 신경망 모델에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된다. That is, the MI average value calculation step (S300) is a method in which the average value of the correlation (MI, Mutual Information) of the hidden unit for LSTM4 was previously calculated, and the correlation (MI, The average value of mutual information) is calculated.

여기서, 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은, 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가되는 것을 특징으로 한다.Here, as for the average value of the correlation (MI, Mutual Information) of the hidden units, as the number of the hidden units increases, the unique characteristic of each hidden unit is strengthened, so that the average value of the correlation (MI, Mutual Information) of the hidden units is It is characterized in that the average value of the correlation (MI, Mutual Information) of the hidden unit is maintained or increased again when the minimum point is passed.

다음으로, 다항회귀분석단계(S400)는 다항회귀분석(Polynomial Regression)을 이용하여 독립변수가 상기 은닉단위의 개수이고, 종속변수가 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값인 2차 다항식이 도출된다. Next, the polynomial regression analysis step (S400) uses polynomial regression, where the independent variable is the number of the hidden units, and the dependent variable is the secondary value of the correlation (MI, Mutual Information) of the hidden units. A polynomial is derived.

일반적으로, 다항회귀분석(Polynomial Regression)은 비선형적인 관계를 갖는 독립변수와 종속변수를 분석하기 위한 방법으로, 하기 [수학식 1]로 2차 다항식이 도출될 수 있다. In general, polynomial regression is a method for analyzing an independent variable and a dependent variable having a nonlinear relationship, and a quadratic polynomial can be derived by the following [Equation 1].

여기서, y는 종속변수이고, x는 독립변수이다. 그리고 해당하는 데이터를 대입하여,

,

상수를 구하면, 해당하는 데이터의 비선형적인 관계를 2차 다항식 그래프로 나타낼 수 있는 것이다.Here, y is the dependent variable and x is the independent variable. And by substituting the corresponding data,

,

If the constant is obtained, the nonlinear relationship of the corresponding data can be expressed as a quadratic polynomial graph.

즉, 본 발명의 y는 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이고, x는 상기 은닉단위의 개수,

,

는 상수이다. 그리고 상기 다항회귀분석단계(S400)는 다수 개의 데이터 중 임의의 (은닉단위의 개수(

), 은닉단위의 상관성의 평균값(

))과 (은닉단위의 개수(

), 은닉단위의 상관성의 평균값(

))를 대입하여 상기

,

을 구할 수 있다.That is, in the present invention, y is the average value of the correlation (MI, Mutual Information) of the hidden units, x is the number of the hidden units,

,

is a constant. And the polynomial regression analysis step (S400) is any of a plurality of data (the number of hidden units (

), the average value of the correlation of the hidden units (

)) and (the number of hidden units (

), the average value of the correlation of the hidden units (

)) by substituting

,

can be obtained

다음으로, 상기 은닉단위개수 선정단계(S500)는 상기 2차 다항식이 은닉단위의 개수에 대해 1차 미분되고, 상기 은닉단위 간 상관성이 제거되도록 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0이 대입되어, 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정된다.Next, in the step of selecting the number of hidden units ( S500 ), the second-order polynomial is first differentiated with respect to the number of hidden units, and the correlation between the hidden units is removed such that the correlation between the hidden units (MI, Mutual Information) is averaged. 0 is substituted in , and the most appropriate number of hidden units of the artificial neural network model is selected.

즉, 상기 은닉단위개수 선정단계(S500)는 상기 다항회귀분석단계(S400)로부터 상기 [수학식 1]이 도출되면 상기 종속변수인 은닉단위의 개수(x)에 대해 상기 [수학식 1]이 하기 [수학식 3]과 같이 1차 미분될 수 있다. That is, in the step of selecting the number of hidden units (S500), when [Equation 1] is derived from the polynomial regression analysis step (S400), the [Equation 1] is It can be first differentiated as shown in [Equation 3] below.

1차 미분을 하는 이유는 상기 언급한 것과 같이 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가되는 특징이 있으므로, 상기 최소점을 구하기 위해서이다. The reason for performing the first differentiation is as mentioned above, as the number of the hidden units increases, the unique characteristics of each hidden unit are strengthened, the average value of the correlation (MI, Mutual Information) of the hidden units is reduced, and the minimum point After passing through, the average value of the correlation (MI, Mutual Information) of the hidden units is maintained or increased again, in order to obtain the minimum point.

그리고 하기 [수학식 3]과 같이 상기 독립변수인 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0을 대입하여, 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정될 수 있는 것이다.And by substituting 0 to the average value of the correlation (MI, Mutual Information) of the hidden unit, which is the independent variable, as shown in Equation 3 below, the most appropriate number of hidden units of the artificial neural network model can be selected.

이때, 상기 은닉단위개수 선정단계(S500)는 상기 은닉단위의 개수(x)가 소수점 첫째자리에서 반올림되어 양의 정수로 도출되는 것을 특징으로 한다. 이는 인공 신경망 모델 내 상기 은닉단위의 개수(x)는 반드시 양의 정수여야 하기 때문이다.In this case, in the step of selecting the number of hidden units ( S500 ), the number (x) of the hidden units is rounded off from the first decimal place and is derived as a positive integer. This is because the number (x) of the hidden units in the artificial neural network model must be a positive integer.

은닉단위 개수 선정방법 및 이를 이용한 수문기후변수 예측방법Method for selecting the number of hidden units and predicting hydroclimatic variables using the same

도 2는 본 발명의 실시예에 따른 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법 흐름도이다.2 is a flowchart of a hydroclimatic variable prediction method using a method for selecting the number of hidden units according to an embodiment of the present invention.

도 2를 보면, 본 발명의 은닉단위 개수 선정방법을 이용한 수문기후변수 예측방법은 데이터 입력단계(S110), MI 추정단계(S210), MI평균값 산출단계(S310), 다항회귀분석단계(S410), 은닉단위개수 선정단계(S510) 및 수문기후변수 출력단계(S610)를 포함한다.Referring to FIG. 2 , the method of predicting hydrological climate variables using the method for selecting the number of hidden units of the present invention includes a data input step (S110), an MI estimation step (S210), an MI mean value calculation step (S310), and a polynomial regression analysis step (S410). , a step of selecting the number of hidden units (S510) and a step of outputting a hydrological climate variable (S610).

보다 구체적으로 설명해보면, 상기 데이터 입력단계(S110)는 인공 신경망 모델에 기후인자 시계열 데이터가 입력된다. More specifically, in the data input step ( S110 ), climate factor time series data is input to the artificial neural network model.

가장 바람직하게, 상기 기후인자 시계열 데이터는 10년 주기 태평양 진동을 나타내는 PDO(Pacific Decadal Oscillation) 시계열 데이터인 것을 특징으로 한다. 상기 PDO 시계열 데이터는 가장 대표적인 기후지수인 태평양 기후 변동 패턴을 시계열로 나타낸 데이터이다.Most preferably, the climate factor time series data is PDO (Pacific Decadal Oscillation) time series data representing a 10-year cycle Pacific oscillation. The PDO time series data is data representing the Pacific climate change pattern, which is the most representative climate index, as a time series.

또한, 가장 바람직하게, 상기 인공 신경망 모델은 장기간 시계열 데이터를 분석하기 위하여 LSTM(Long and Short Term Memory) 기반의 순환신경망이 사용될 수 있다. 이에 따라, 입력 데이터인 상기 PDO 시계열 데이터, 은닉단위 및 출력데이터인 수문기후변수가 모두 시계열 데이터일 수 있다.Also, most preferably, as the artificial neural network model, a Long and Short Term Memory (LSTM)-based recurrent neural network may be used to analyze long-term time series data. Accordingly, the PDO time series data as input data, the hidden unit, and the hydrological climate variable as output data may all be time series data.

그리고 상기 인공 신경망 모델은 상기 은닉단위의 개수에 따라 LSTM1, LSTM4 또는 LSTM10으로 명명할 수 있다. 예컨대, LSTM1은 은닉층에 은닉단위 1개를 갖는 인공 신경망 모델이고, LSTM4는 은닉층에 은닉단위 4개를 갖는 인공 신경망 모델인 것이다.And, the artificial neural network model may be named LSTM1, LSTM4, or LSTM10 according to the number of hidden units. For example, LSTM1 is an artificial neural network model having one hidden unit in the hidden layer, and LSTM4 is an artificial neural network model having four hidden units in the hidden layer.

다음으로, 상기 MI 추정단계(S210)는 상기 기후인자 시계열 데이터에 있어서, 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)이 추정된다.Next, in the MI estimation step (S210), in the climate factor time series data, the correlation (MI, Mutual Information) of the hidden units according to the number of hidden units is estimated.

한편, 상기 MI 추정단계(S210)는 이변량 경험누적 분포함수를 이용하여 은닉단위의 상관성(MI, Mutual Information)이 추정된다. 여기서, 이변량 경험누적 분포함수는 경험적 척도와 관련된 누적분포함수로써, 확률표본에 대하여 통계량을 구하고 작은 값부터 순서대로 배치한 계단형 그래프로 도출될 수 있다. Meanwhile, in the MI estimation step S210, the correlation (MI, Mutual Information) of the hidden unit is estimated using the bivariate empirical cumulative distribution function. Here, the bivariate empirical cumulative distribution function is a cumulative distribution function related to an empirical scale, and it can be derived as a step-like graph in which statistics are obtained for a probability sample and arranged in order from the smallest value.

이때, 상기 MI 추정단계(S210)는 4개의 은닉단위에 대해 서로 간의 은닉단위의 상관성(MI, Mutual Information)이 이변량 경험누적 분포함수를 이용하여 추정될 수 있고, 제 1 MI부터 제 n MI로 나타낼 수 있다.In this case, in the MI estimation step (S210), the correlation (MI, Mutual Information) of the four hidden units with each other can be estimated using a bivariate empirical cumulative distribution function, and from the first MI to the nth MI. can be expressed as

다음으로, MI평균값 산출단계(S310)는 상기 은닉단위의 개수별 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된다. Next, in the MI average value calculation step S310, the average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units is calculated.

LSTM4를 이용하여 예를 들어보면, 상기 MI 추정단계(S210)로부터 추정된 제 1 MI부터 제 n MI를 평균하면 상기 은닉단위의 개수 4개를 갖는 LSTM4에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 최종적으로 산출될 수 있다. For example, using LSTM4, if the nth MI from the first MI estimated in the MI estimation step S210 is averaged, the correlation (MI, Mutual Information) of the hidden units for the LSTM4 having four hidden units. ) can be finally calculated.

즉, 상기 MI평균값 산출단계(S310)는 앞서 LSTM4에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된 방식으로, LSTM1부터 LSTMn까지의 인공 신경망 모델에 대한 은닉단위의 상관성(MI, Mutual Information)의 평균값이 산출된다. That is, the MI average value calculation step (S310) is a method in which the average value of the correlation (MI, Mutual Information) of the hidden unit for LSTM4 was previously calculated, and the correlation (MI, The average value of mutual information) is calculated.

이때, 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은, 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가되는 것을 특징으로 한다.At this time, as for the average value of the correlation (MI, Mutual Information) of the hidden units, the unique characteristic of each hidden unit is strengthened as the number of the hidden units increases, so that the average value of the correlation (MI, Mutual Information) of the hidden units is It is characterized in that the average value of the correlation (MI, Mutual Information) of the hidden unit is maintained or increased again when the minimum point is passed.

이를 증명하고자 간단한 실험이 진행된다.A simple experiment is conducted to prove this.

도 3은 본 발명의 실시예에 따른 PDO 시계열 데이터를 표시한 도면이다. 도 4는 본 발명의 실시예에 따른 LSTM4의 각 은닉단위의 상태를 표시한 도면이다. 도 5는 본 발명의 실시예에 따른 LSTM10의 각 은닉단위의 상태를 표시한 도면이다.3 is a diagram illustrating PDO time series data according to an embodiment of the present invention. 4 is a view showing the state of each hidden unit of LSTM4 according to an embodiment of the present invention. 5 is a view showing the state of each hidden unit of the LSTM10 according to an embodiment of the present invention.

우선, 도 3을 보면, 본 실험에서는 PDO 시계열 데이터를 입력 데이터로 한다. 상기 PDO 시계열 데이터는 가장 대표적인 기후지수인 태평양 기후 변동 패턴을 시계열로 나타낸 데이터이다.First, referring to FIG. 3 , in this experiment, PDO time series data is used as input data. The PDO time series data is data representing the Pacific climate change pattern, which is the most representative climate index, as a time series.

도 4를 보면, 은닉단위의 개수로 4개를 갖는 LSTM4 인공 신경망 모델을 이용하여 상기 PDO 시계열 데이터가 분석된 것으로, 상기 각 은닉단위의 상태(검은색 그래프)가 상기 PDO 시계열 데이터 그래프에 겹쳐져 표시되어 있다. 4, the PDO time series data is analyzed using an LSTM4 artificial neural network model having four as the number of hidden units, and the state of each hidden unit (black graph) is displayed overlaid on the PDO time series data graph. has been

보다 구체적으로, 도 3의 ht-1 그래프를 보면 제 1 은닉단위는 고주파수 변동성을 갖고 있다. 상기 제 1 은닉단위는 고주파를 나타내면서 정상(normal)에 가깝다. 대조적으로 제 2 내지 4 은닉단위는 양 또는 음의 단 하나의 위상에만 민감하기 때문에 왜곡이 심한 것을 볼 수 있다.More specifically, referring to the ht-1 graph of FIG. 3 , the first hidden unit has high-frequency variability. The first hidden unit is close to normal while representing a high frequency. In contrast, since the second to fourth hidden units are sensitive to only one positive or negative phase, it can be seen that the distortion is severe.

1920-1940년 및 1980-2000년과 같은 주파수가 높은 기간 동안 도 3의 ht-2 그래프를 보면 제 2 은닉단위의 상태는 높은 양의 값을 얻는 반면, 도 3의 ht-4 그래프를 보면 제 4 은닉단위의 상태는 높은 음의 값을 얻는다.When looking at the ht-2 graph of FIG. 3 during periods of high frequency such as 1920-1940 and 1980-2000, the state of the second hidden unit obtains a high positive value, while looking at the ht-4 graph of FIG. 4 The state of the hidden unit gets a high negative value.

또한, 제 2 은닉단위와 제 3 은닉단위는 거의 동일한 모습을 보이며 서로 매우 의존적이고, 제 1 은닉단위와 제 4 은닉단위는 거의 의존하지 않는다. Also, the second hidden unit and the third hidden unit show almost the same appearance and are highly dependent on each other, and the first hidden unit and the fourth hidden unit hardly depend on each other.

다음으로, 비교적 많은 수의 은닉단위를 갖는 구조의 특성을 파악하기 위해 LSTM10 모델이 추가로 분석된다. Next, the LSTM10 model is further analyzed to characterize the structure with a relatively large number of hidden units.

도 5를 보면, 은닉단위의 개수로 10개를 갖는 LSTM10 인공 신경망 모델을 이용하여 상기 PDO 시계열 데이터가 분석된 것으로, 상기 각 은닉단위의 상태(검은색 그래프)가 상기 PDO 시계열 데이터에 겹쳐져 표시되어 있다. 5, the PDO time series data was analyzed using an LSTM10 artificial neural network model having 10 as the number of hidden units, and the state of each hidden unit (black graph) is displayed overlaid on the PDO time series data. have.

1900-1930년인 제 1 기간의 변동성은 제 9 내지 10 은닉단위로 캡처되는 반면, 이외 은닉단위는 제 1 기간 동안 -1 또는 1의 값을 일관되게 유지한다. 즉, 상기 10개의 은닉단위 중 중요한 스펙트럼은 제 1 기간에서 제 9 내지 10 은닉단위이다.The variability in the first period, 1900-1930, is captured in the 9th to 10th hidden units, while the other hidden units consistently hold a value of -1 or 1 during the first period. That is, an important spectrum among the ten hidden units is the ninth to tenth hidden units in the first period.

그리고 제 3 은닉단위는 제 4 은닉단위와 음의 관계가 있고, 제 7 은닉단위와 양의 관계가 있는 것으로 보인다. 제 1 은닉단위는 도 3의 ht-1에 표시된 것처럼 상기 PDO 시계열 데이터의 피크값에 매우 민감하지만, 제 8 단위는 음의 피크에 역으로 민감하다. 제 6 은닉단위는 제 1 기간을 제외하고는 장기적으로 상기 PDO시계열 데이터와 음의 관계를 나타낸다.And it seems that the 3rd hidden unit has a negative relationship with the 4th hidden unit, and has a positive relationship with the 7th hidden unit. The first hidden unit is very sensitive to the peak value of the PDO time series data as shown in ht-1 of FIG. 3, but the eighth unit is inversely sensitive to the negative peak. The sixth hidden unit represents a negative relationship with the PDO time series data in the long term except for the first period.

다만, 이외의 단위들은 서로 의존하지 않으므로, 상기 LSTM10 모델 내 은닉단위는 상기 LSTM4 모델 내 은닉단위보다 낮은 상관성을 갖는 것을 볼 수 있다.However, since other units do not depend on each other, it can be seen that the hidden unit in the LSTM10 model has a lower correlation than the hidden unit in the LSTM4 model.

추가로, 도 6은 본 발명의 실시예에 따른 LSTM6과 LSTM14의 각 은닉단위의 상관성을 표시한 도면이다.In addition, FIG. 6 is a diagram showing the correlation of each hidden unit of LSTM6 and LSTM14 according to an embodiment of the present invention.

도 6은 플로트 매트릭스이고, LSTM의 은닉단위를 대각선 패널에 위치시키고 상관성을 표시한 그래프이다. 즉, 하나의 은닉단위와 다른 하나의 은닉단위에서 겹쳐지는 패널에 흩어짐이 적은 선명한 그래프가 도출될 경우 두 은닉단위의 양/음의 상관성이 높다고 할 수 있다.6 is a float matrix, and is a graph in which the hidden units of the LSTM are placed on a diagonal panel and the correlation is displayed. In other words, when a clear graph with less scatter is derived from overlapping panels in one hidden unit and the other, it can be said that the positive/negative correlation between the two hidden units is high.

도 6의 (a)는 6개의 은닉단위를 갖는 LSTM6의 플로트 매트릭스이고, 모든 패널에서 흩어짐이 적고 양 또는 음의 상관성을 명확히 보여주는 선명한 그래프가 도출된 것을 볼 수 있다. 도 5의 (b)는 14개의 은닉단위를 갖는 LSTM14의 플로트 매트릭스이고, 모든 패널에서 흩어짐이 많은 불명확한 그래프가 도출된 것을 볼 수 있다.Fig. 6(a) is a float matrix of LSTM6 having 6 hidden units, and it can be seen that a clear graph is derived with less dispersion and clearly showing positive or negative correlation in all panels. Fig. 5 (b) is a float matrix of LSTM14 having 14 hidden units, and it can be seen that an ambiguous graph with a lot of scatter is derived from all panels.

즉, 적은 수의 은닉단위에서 각 은닉단위는 입력 데이터로 사용된 상기 PDO 시계열 데이터가 충분히 분리되지 않았기 때문에 은닉단위 간 강한 상관성을 갖는다. 반면에, 많은 수의 은닉단위에서 각 은닉단위는 입력 데이터로 사용된 상기 PDO 시계열 데이터가 충분히 분리되었기 때문에 은닉단위 간 약한 상관성을 갖는다.That is, in the small number of hidden units, each hidden unit has a strong correlation between the hidden units because the PDO time series data used as input data is not sufficiently separated. On the other hand, in a large number of hidden units, each hidden unit has a weak correlation between the hidden units because the PDO time series data used as input data is sufficiently separated.

다만, 은닉단위의 상관성(MI, Mutual Information)의 평균값이 최소인 은닉단위의 개수를 초과하게 되면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 유지되거나 다시 증가될 수 있다. 이는 하기에서 보다 상세히 설명한다.However, when the average value of the correlation (MI, Mutual Information) of the hidden units exceeds the minimum number of hidden units, the average value of the correlation (MI, Mutual Information) of the hidden units may be maintained or increased again. This is explained in more detail below.

다음으로, 상기 다항회귀분석단계(S410)는 다항회귀분석(Polynomial Regression)을 이용하여 독립변수가 상기 은닉단위의 개수이고, 종속변수가 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값인 2차 다항식이 도출된다.Next, in the polynomial regression analysis step (S410), the independent variable is the number of hidden units using polynomial regression, and the dependent variable is the average value of the correlation (MI, Mutual Information) of the hidden units. A second polynomial is derived.

가장 바람직하게, 상기 다항회귀분석단계(S410)는, 상기 기후인자 시계열 데이터가 10년 주기 태평양 진동을 나타내는 PDO(Pacific Decadal Oscillation) 시계열 데이터일 경우 하기 [수학식 2]로 2차 다항식이 산출되는 것을 특징으로 한다.Most preferably, in the polynomial regression analysis step (S410), when the climate factor time series data is Pacific Decadal Oscillation (PDO) time series data representing the 10-year cycle Pacific oscillation, the second polynomial is calculated by the following [Equation 2] characterized in that

여기서, 종속변수인 y는 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이고, 독립변수인 x는 상기 은닉단위의 개수이다.Here, the dependent variable y is the average value of the correlation (MI, Mutual Information) of the hidden units, and the independent variable x is the number of the hidden units.

도 7은 본 발명의 실시예에 따른 다항회귀분석단계로부터 산출된 [수학식 2]를 표시한 그래프이다.7 is a graph showing [Equation 2] calculated from the polynomial regression analysis step according to an embodiment of the present invention.

도 7을 보면, 상기 MI평균값 산출단계(S310)로부터 산출된 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 각 은닉단위의 개수에 따라 좌표가 표시되어 있다. 그리고 다항회귀분석단계(S410)로부터 산출된 [수학식 2]를 표시한 그래프가 보라색으로 표시되어 있다.Referring to FIG. 7 , the coordinates of the average value of the correlation (MI, Mutual Information) of the hidden units calculated in the MI average value calculation step S310 are displayed according to the number of each hidden unit. And the graph showing [Equation 2] calculated from the polynomial regression analysis step (S410) is displayed in purple.

도 7을 보면, 은닉단위의 상관성(MI, Mutual Information)의 평균값이 최소인 은닉단위의 개수를 초과하게 되면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 다시 증가되는 것을 볼 수 있다.7, when the average value of the correlation (MI, Mutual Information) of the hidden units exceeds the minimum number of hidden units, it can be seen that the average value of the correlation (MI, Mutual Information) of the hidden units is increased again.

즉, 일정 이상으로 은닉단위의 개수가 증가하면 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값이 다시 증가하게 되고, 이로 인해 분석부하가 증가하여 속도가 느려질 수 있는 단점이 있다. 그리고 은닉단위의 수에 따라 매개변수가 기하급수적으로 증가하기 때문에 적절한 은닉단위의 수를 선정해주는 것이 매우 중요하다.That is, when the number of hidden units increases by more than a certain amount, the average value of the correlation (MI, Mutual Information) of the hidden units increases again, which increases the analysis load, thereby slowing the speed. And it is very important to select an appropriate number of hidden units because the parameter increases exponentially according to the number of hidden units.

다음으로, 상기 은닉단위개수 선정단계(S510)는 상기 2차 다항식이 은닉단위의 개수에 대해 1차 미분되고, 상기 은닉단위 간 상관성이 제거되도록 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0이 대입되어, 상기 인공 신경망 모델의 가장 적절한 은닉단위의 개수가 선정된다.Next, in the step of selecting the number of hidden units ( S510 ), the second-order polynomial is first differentiated with respect to the number of hidden units, and the correlation between the hidden units is removed such that the correlation between the hidden units (MI, Mutual Information) is averaged. 0 is substituted in , and the most appropriate number of hidden units of the artificial neural network model is selected.

또한, 상기 은닉단위개수 선정단계(S510)는 상기 다항회귀분석단계(S410)로부터 하기 [수학식 2]로 2차 다항식이 산출된다면 상기 [수학식 2]가 은닉단위의 개수(x)에 대해 1차 미분된다. In addition, in the step of selecting the number of hidden units (S510), if the second polynomial is calculated by the following [Equation 2] from the polynomial regression analysis step (S410), the [Equation 2] is the number of hidden units (x) First differentiated

1차 미분을 하는 이유는 상기 언급한 것과 같이 상기 은닉단위의 개수가 증가할수록 각각의 은닉단위의 고유한 특성이 강화되어 상기 은닉단위의 상관성(MI, Mutual Information)이 감소되고, 최소점을 지나면 상기 은닉단위의 상관성(MI, Mutual Information)이 유지되거나 다시 증가되는 특징이 있으므로, 상기 최소점을 구하기 위해서이다.The reason for performing the first differentiation is as mentioned above, as the number of the hidden units increases, the unique characteristics of each hidden unit are strengthened, the correlation (MI, Mutual Information) of the hidden units is reduced, and when the minimum point is passed, Since the correlation (MI, Mutual Information) of the hidden unit is maintained or increased again, this is to obtain the minimum point.

그리고 상기 은닉단위개수 선정단계(S510)는 상기 은닉단위 간 상관성이 제거되도록 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값에 0이 대입되면 상기 인공 신경망 모델의 은닉단위의 개수(x)로 14.04가 산출된다.And in the step of selecting the number of hidden units (S510), when 0 is substituted for the average value of the correlation (MI, Mutual Information) of the hidden units so that the correlation between the hidden units is removed, the number of hidden units of the artificial neural network model (x) 14.04 is yielded.

이때, 상기 은닉단위의 개수(x)는 양의 정수로 도출되어야함으로, 소수점 첫째자리에서 반올림되어 14개인 것이다.At this time, since the number (x) of the hidden unit should be derived as a positive integer, it is rounded up to 14 from the first decimal place.

즉, 상기 은닉단위개수 선정단계(S510)는 상기 다항회귀분석단계(S410)로부터 도출된 상기 [수학식 2]를 미분함으로써, 상기 은닉단위의 상관성(MI, Mutual Information)이 최소가 되는 은닉단위의 개수를 용이하게 찾을 수 있다.That is, in the step of selecting the number of hidden units (S510), by differentiating the [Equation 2] derived from the polynomial regression analysis step (S410), the correlation (MI, Mutual Information) of the hidden units is the minimum hidden unit. It is easy to find the number of

다음으로, 상기 수문기후변수 출력단계(S610)는 상기 은닉단위 개수 선정단계(S500)로부터 선정된 은닉단위의 개수를 갖는 인공 신경망 모델을 이용하여 상기 PDO 시계열 데이터가 분석되어 수문기후변수가 출력된다.Next, in the hydrological climate variable output step (S610), the PDO time series data is analyzed using an artificial neural network model having the number of hidden units selected in the number of hidden units selection step (S500), and the hydrological climate variable is output. .

즉, 상기 PDO 시계열 데이터는 과거로부터 추출된 값으로, 상기 인공 신경망 모델에 입력되면 상기 수문기후변수 출력단계(S610)는 미래에 발생될 수 있는 유의미한 수문학적 데이터가 출력되는 것이다.That is, the PDO time series data is a value extracted from the past, and when it is input to the artificial neural network model, the hydrological climate variable output step S610 outputs meaningful hydrological data that may be generated in the future.

종래 비교conventional comparison

도 8은 본 발명의 실시예에 따른 은닉단위의 개수에 따른 RMSE와 은닉단위의 상관성을 표시한 도면이다.8 is a diagram illustrating the correlation between RMSE and hidden units according to the number of hidden units according to an embodiment of the present invention.

본 발명의 정확성 및 신뢰성을 증명하기 위해서 일반적으로 사용되는 은닉단위의 개수에 따른 RMSE 값과 본 발명의 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)의 평균값을 비교해본다.In order to prove the accuracy and reliability of the present invention, the RMSE value according to the number of hidden units generally used and the average value of the correlation (MI, Mutual Information) of the hidden units according to the number of hidden units of the present invention are compared.

도 8을 보면, 파란색 그래프가 일반적으로 사용되는 은닉단위의 개수에 따른 RMSE 값이고, 빨간색 그래프가 본 발명의 은닉단위의 개수에 따른 은닉단위의 상관성(MI, Mutual Information)의 평균값이다.Referring to FIG. 8 , the blue graph is the RMSE value according to the number of commonly used hidden units, and the red graph is the average value of the correlation (MI, Mutual Information) of the hidden units according to the number of hidden units of the present invention.

상기 평균 제곱근 오차(RMSE, Root Mean Square Error)는 k-폴드 교차검증을 통해 도출될 수 있다. 여기서, k-폴드 교차검증은 k는 양의 정수이고, 전체 데이터를 k 수 만큼 비슷한 크기의 집합으로 나눈다. The root mean square error (RMSE) may be derived through k-fold cross-validation. Here, in k-fold cross-validation, k is a positive integer, and the entire data is divided into sets of similar size as many as k.

즉, k=5일 때, 전체 데이터를 5개의 부분집합으로 분할한 후, 각 분할마다 하나의 폴드를 테스트용으로 사용하고 나머지 4개의 폴드는 훈련용으로 사용하고 이 과정을 반복하여 각 분할마다 정확도를 측정하는 것이다.That is, when k = 5, after dividing the entire data into 5 subsets, one fold for each division is used for testing and the remaining 4 folds are used for training. Repeat this process for each division. to measure accuracy.

가장 바람직하게, 도 8은 K=10일 때 은닉단위의 개수에 따른 평균 제곱근 오차(RMSE, Root Mean Square Error)를 도출한 것이다.Most preferably, FIG. 8 shows root mean square error (RMSE) derived according to the number of hidden units when K=10.

즉, 도 8을 보면, LSTM5의 RMSE 값과 유사할 때까지 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은 상당한 감소를 보여준다. 이와 같이 두 그래프가 완벽하게 일치하지 않더라도 은닉단위의 개수가 증가함에 따라 RMSE와 상기 은닉단위의 상관성(MI, Mutual Information)의 평균값은 유사하게 동작되는 것을 알 수 있다. 이에 따라, 상기 은닉단위의 상관성(MI, Mutual Information)은 종래 사용되는 RMSE 값을 대체할 수 있다. That is, referring to FIG. 8 , the average value of the correlation (MI, Mutual Information) of the hidden unit shows a significant decrease until similar to the RMSE value of LSTM5. As such, even if the two graphs do not perfectly match, it can be seen that the average value of the correlation (MI, Mutual Information) between the RMSE and the hidden unit operates similarly as the number of hidden units increases. Accordingly, the correlation (MI, Mutual Information) of the hidden unit can replace the conventionally used RMSE value.

이상으로 본 발명의 특정한 부분을 상세히 기술하였는바, 당업계의 통상의 지식을 가진 자에게 있어서 이러한 구체적인 기술은 단지 바람직한 구현예일 뿐이며, 이에 본 발명의 범위가 제한되는 것이 아닌 점은 명백하다.As described above in detail a specific part of the present invention, for those of ordinary skill in the art, this specific description is only a preferred embodiment, and it is clear that the scope of the present invention is not limited thereto.

따라서 본 발명의 실질적인 범위는 첨부된 청구항과 그의 등가물에 의하여 정의된다고 할 것이다.Accordingly, the substantial scope of the present invention will be defined by the appended claims and their equivalents.

Claims

A data input step of inputting time series data to the artificial neural network model;
an MI estimation step of estimating a correlation (MI, Mutual Information) of hidden units according to the number of hidden units in the time series data;
an MI average value calculating step of calculating an average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units;
A polynomial regression analysis step of deriving a second-order polynomial in which the independent variable is the number of hidden units and the dependent variable is the average value of the correlation (MI, Mutual Information) of the hidden units using polynomial regression; and
The second polynomial is first differentiated with respect to the number of hidden units, and 0 is substituted for the average value of the correlation (MI, Mutual Information) of the hidden units so that the correlation between the hidden units is removed, so that the most appropriate of the artificial neural network model A method of selecting the number of hidden units using the correlation of hidden units in an artificial neural network model, including;

The method of claim 1,
The average value of the correlation (MI, Mutual Information) of the hidden unit is,
As the number of the hidden units increases, the unique characteristics of each hidden unit are strengthened, so that the average value of the correlation (MI, Mutual Information) of the hidden units is reduced. ) is maintained or increased again. A method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model.

The method of claim 1,
The polynomial regression analysis step is,
A method for selecting the number of hidden units using the correlation of hidden units in an artificial neural network model, characterized in that a quadratic polynomial is calculated by the following [Equation 1].
[Equation 1]

Here, y is the average value of the correlation (MI, Mutual Information) of the hidden units, x is the number of the hidden units,

,

is a constant.

The method of claim 1,
The step of selecting the number of hidden units is
A method for selecting the number of hidden units using correlation of hidden units in an artificial neural network model, characterized in that the number (x) of the hidden units is rounded off from the first decimal place and derived as a positive integer.

A data input step of inputting climate factor time series data into the artificial neural network model;
an MI estimation step of estimating a correlation (MI, Mutual Information) of hidden units according to the number of hidden units in the climate factor time series data;
an MI average value calculating step of calculating an average value of the correlation (MI, Mutual Information) of the hidden units for each number of the hidden units;
A polynomial regression analysis step of deriving a second-order polynomial in which the independent variable is the number of hidden units and the dependent variable is the average value of the correlation (MI, Mutual Information) of the hidden units using polynomial regression;
The second polynomial is first differentiated with respect to the number of hidden units, and 0 is substituted for the average value of the correlation (MI, Mutual Information) of the hidden units so that the correlation between the hidden units is removed, so that the most appropriate of the artificial neural network model a hidden unit number selection step in which the number of hidden units is selected; and
A method of selecting the number of hidden units including a; A method of predicting hydroclimate variables using

6. The method of claim 5,
The average value of the correlation (MI, Mutual Information) of the hidden unit is,
As the number of the hidden units increases, the unique characteristics of each hidden unit are strengthened, so that the average value of the correlation (MI, Mutual Information) of the hidden units is reduced. ) is maintained or increased again.

6. The method of claim 5,
The climate factor time series data is a hydrological climate variable prediction method using a method for selecting the number of hidden units, characterized in that the PDO (Pacific Decadal Oscillation) time series data representing the 10-year Pacific oscillation.

8. The method of claim 7,
The polynomial regression analysis step is,
A hydrological climate variable prediction method using a method for selecting the number of hidden units, characterized in that a quadratic polynomial is calculated by the following [Equation 2].
[Equation 2]

Here, y is the average value of the correlation (MI, Mutual Information) of the hidden units, and x is the number of the hidden units.

6. The method of claim 5,
The step of selecting the number of hidden units is
A method for predicting hydrological climate variables using a method for selecting the number of hidden units, characterized in that the number (x) of the hidden units is rounded off from the first decimal place and derived as a positive integer.