KR102561402B1

KR102561402B1 - surrogate model constructing method for extrapolation prediction and apparatus of thereof

Info

Publication number: KR102561402B1
Application number: KR1020210119222A
Authority: KR
Inventors: 김종호; 쩐옥빈
Original assignee: 울산대학교 산학협력단
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2023-08-01
Also published as: KR20230036411A

Abstract

실시예는, 외삽 현상을 예측하는 대체 모델을 생성하는 방법 및 장치에 대한 것이다. 실시예에 따른 외삽 현상을 예측하는 대체 모델을 생성하는 방법은 다항식 차수에 기초하여 PCE 계수들을 추정하고, 자기 상관 함수의 매개변수를 결정하는 단계; PCE 계수 및 상기 매개변수를 통해 반복 알고리즘의 반복 차수에 따라 다른 가우스 분산을 가지는 복수의 대체 모델들을 구성하는 단계; 및 복수의 대체 모델들 중 실제 결과와 편차가 최소인 대체 모델을 선택하는 단계를 포함할 수 있다.Embodiments are directed to methods and apparatus for generating surrogate models that predict extrapolation phenomena. A method for generating a surrogate model for predicting an extrapolation phenomenon according to an embodiment includes estimating PCE coefficients based on a polynomial order and determining parameters of an autocorrelation function; constructing a plurality of substitute models having different Gaussian variances according to the iteration order of an iterative algorithm through PCE coefficients and the parameters; and selecting an alternative model having a minimum deviation from an actual result from among a plurality of alternative models.

Description

A method for generating a surrogate model for predicting an extrapolation phenomenon and an apparatus therefor

실시예는, 외삽 현상을 예측하는 대체 모델을 생성하는 방법 및 그 장치에 관한 것이다.Embodiments relate to a method and apparatus for generating a surrogate model predicting an extrapolation phenomenon.

일반적으로 자료 기반의 모델들(머신 러닝, 딥러닝 기반 모델들을 포함)은 현재 수집된 자료를 기반으로 발생할 현상을 학습하고 예측한다. 이와 같은 모델들은 훈련된 데이터 공간에서 상당히 벗어난 데이터 공간(예, 홍수와 같은 극한 이벤트)에 대해서는 예측력이 크게 낮아지는 단점이 존재한다.In general, data-based models (including machine learning and deep learning-based models) learn and predict phenomena based on currently collected data. Such models have a disadvantage in that their predictive power is greatly reduced for data spaces (eg, extreme events such as floods) that deviate significantly from the trained data space.

이와 관련하여 PCE(Polynomial Chaos Expansion) 모델은 일련의 직교 다항식을 사용하여 원형 모델의 추세 경향을 포착하는 데에 이용된다. PCE 모델을 구성할 시 언더피팅(underfitting) 현상을 방지하기 위해 일반적으로 충분한 학습 데이터가 요구된다.In this regard, PCE (Polynomial Chaos Expansion) models are used to capture trend trends in prototypical models using a set of orthogonal polynomials. When constructing a PCE model, sufficient training data is generally required to prevent underfitting.

또한, 정규 크리깅(ordinary kriging)은 전체적인 추세 모델을 만들고 최우량 추정(maximum likelihood)을 이용해 얻은 국부적인 편차를 이용하여 데이터를 보간하는 방법에 대한 것이다.Ordinary kriging is also about creating an overall trend model and interpolating data using local deviations obtained using maximum likelihood.

실시예에 따른 발명은, PCE 기법을 이용하여 자료가 가지는 전반적인 경향이나 추세를 모델링하고, 그 추세가 반영하지 못하는 로컬 변형은 크리깅(Kriging) 기법으로 보완함으로써 원형 모델의 거동을 재현하는 대체 모델 및 대체 모델을 생성하는 방법을 제공하고자 한다.In the invention according to the embodiment, an alternative model that reproduces the behavior of the original model by modeling the overall tendency or trend of the data using the PCE technique and supplementing the local deformation that the trend does not reflect with the Kriging technique, and We want to provide a way to create an alternative model.

외삽 현상을 예측하는 대체 모델을 생성하는 방법에 있어서, 다항식 차수에 기초하여 PCE 계수들을 추정하고, 자기 상관 함수의 매개변수를 결정하는 단계; 상기 PCE 계수 및 상기 매개변수를 통해 반복 알고리즘의 반복 차수에 따라 다른 가우스 분산을 가지는 복수의 대체 모델들을 구성하는 단계; 및 상기 복수의 대체 모델들 중 실제 결과와 편차가 최소인 대체 모델을 선택하는 단계를 포함하는, 대체 모델을 생성하는 방법이 제공될 수 있다.A method of generating a surrogate model for predicting an extrapolation phenomenon, comprising: estimating PCE coefficients based on a polynomial order and determining parameters of an autocorrelation function; constructing a plurality of alternative models having different Gaussian variances according to an iteration order of an iterative algorithm through the PCE coefficient and the parameters; and selecting an alternative model having a minimum deviation from an actual result from among the plurality of alternative models.

상기 복수의 대체 모델들을 구성하는 단계는, 해당하는 다항식 차수에 따라 상기 PCE 계수 및 상기 매개 변수에 대응하는 추세 함수 및 상기 자기 상관 함수에 의한 분산을 도출하는 단계를 포함할 수 있다.The configuring of the plurality of substitute models may include deriving a variance by a trend function corresponding to the PCE coefficient and the parameter and the autocorrelation function according to a corresponding polynomial order.

상기 복수의 대체 모델들은, 상기 반복 알고리즘에 의해 상기 반복 차수마다 상기 추세 함수에 새로운 항이 합산되는 형태일 수 있다.The plurality of replacement models may be in a form in which a new term is added to the trend function for each iteration order by the iterative algorithm.

상기 대체 모델을 선택하는 단계는, 상기 복수의 대체 모델들 각각에 대해서 리브 원 아웃 에러(Leave-one-out error)를 계산하는 단계를 포함할 수 있다.Selecting the surrogate model may include calculating a leave-one-out error for each of the plurality of surrogate models.

상기 대체 모델을 선택하는 단계는, 상기 실제 결과와 편차가 최소인 차수에 해당하는 대체 모델을 선택하는 단계를 포함할 수 있다.The selecting of the surrogate model may include selecting a surrogate model corresponding to a degree having a minimum deviation from the actual result.

성능 지수에 기초하여 상기 선택된 대체 모델에 대한 불확실성을 정량화하는 단계를 더 포함할 수 있다.It may further include quantifying uncertainty for the selected surrogate model based on a figure of merit.

상기 성능 지수는, 상기 선택된 대체 모델 및 원본 모델의 앙상블 세트 간의 유클리드 거리와 상기 선택된 대체 모델 및 원본 모델의 상대적인 런타임에 기초하여 정의될 수 있다.The figure of merit may be defined based on a Euclidean distance between an ensemble set of the selected surrogate model and the original model and a relative runtime of the selected surrogate model and the original model.

상기 매개변수 는, 최대 가능도(maximum likelihood) 추정 방법 또는 리브 원 아웃(Leave-one-out) 교차 검증을 통해 획득 가능할 수 있다.The parameter x may be obtained through a maximum likelihood estimation method or a leave-one-out cross-validation.

상기 PCE 계수 ε_α는 최소 자승 회귀, 베이지안 압축 감지 및 최소 각도 회귀 중 적어도 하나의 방법으로 계산될 수 있다.The PCE coefficient ε _α may be calculated using at least one of least square regression, Bayesian compressive sensing, and least angular regression.

실시예에 따른 발명을 통해, PCE 기법을 이용하여 자료가 가지는 전반적인 경향이나 추세를 모델링하고, 그 추세가 반영하지 못하는 로컬 변형은 크리깅(Kriging) 기법으로 보완함으로써 원형 모델의 거동을 재현하는 대체 모델 및 대체 모델을 생성하는 방법을 제공할 수 있다.Through the invention according to the embodiment, an alternative model that reproduces the behavior of the original model by modeling the overall trend or trend of the data using the PCE technique and supplementing the local deformation that the trend does not reflect with the Kriging technique and a method for generating an alternative model.

도 1은 실시예에서, 외삽 현상을 예측하기 위한 대체 모델을 생성하는 방법의 흐름도이다.
도 2는 실시예에서, 대체 모델을 생성하고, 대체 모델의 성능을 평가하는 방법을 설명하기 위한 도면이다.
도 3은 실시예에서, 대체 모델의 오차와 관련된 성능을 평가한 도면이다.
도 4는 실시예에서, 실제 발생한 이벤트에 대해서 대체 모델을 적용한 결과를 도시한 그래프이다.
도 5는 실시예에서, 허용 임계값이 원형 모델 및 대체 모델의 성능에 미치는 영향을 도시한 도면이다.1 is a flow chart of a method for generating a surrogate model for predicting an extrapolation phenomenon, in an embodiment.
2 is a diagram for explaining a method of generating a surrogate model and evaluating the performance of the surrogate model in an embodiment.
3 is a diagram illustrating performance related to error of an alternative model in an embodiment.
4 is a graph showing results of applying an alternative model to an event that actually occurred in an embodiment.
5 is a diagram illustrating the effect of acceptance thresholds on the performance of the original model and the surrogate model in an embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes can be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all changes, equivalents or substitutes to the embodiments are included within the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in the examples are used only for descriptive purposes and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same reference numerals are given to the same components regardless of reference numerals, and overlapping descriptions thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description will be omitted.

또한, 실시예의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. In addition, in describing the components of the embodiment, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, order, or order of the corresponding component is not limited by the term. When an element is described as being “connected,” “coupled to,” or “connected” to another element, that element may be directly connected or connected to the other element, but there may be another element between the elements. It should be understood that may be "connected", "coupled" or "connected".

어느 하나의 실시예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시 예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시 예에 기재한 설명은 다른 실시 예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components including common functions will be described using the same names in other embodiments. Unless stated to the contrary, descriptions described in one embodiment may be applied to other embodiments, and detailed descriptions will be omitted to the extent of overlap.

실시예에서는 PCE(Polynomial Chaos Expansion) 및 OK(ordinary kriging) 기법을 이용하여 수문학적 대체 모델을 제공할 수 있다. 실시예에 따른 대체 모델은 아래의 수학식 1로 간단히 표현할 수 있다.In an embodiment, a hydrological replacement model may be provided using PCE (Polynomial Chaos Expansion) and OK (ordinary kriging) techniques. The replacement model according to the embodiment can be simply expressed as Equation 1 below.

[수학식 1][Equation 1]

여기서 M은 매개변수 θ, 모델 상태 x 및 물리력 u를 포함하는 모델 입력을 나타내는 X가 포함되는 수문학적 모델을 의미한다. Y는 해당 모델 출력(Quantity of Interest, QoI)을 나타내며, 이는 스칼라 수량 또는 여러 QoI가 있는 벡터일 수 있다. 입력 X의 수는 매개변수의 수(N_P), 모델 상태의 수(N_S), 강제력의 수(N_I)의 합에 의해 결정되는 N_X이다.where M denotes a hydrologic model with X representing model inputs including parameter θ, model state x, and physical force u. Y represents the corresponding model output (Quantity of Interest, QoI), which can be a scalar quantity or a vector with multiple QoIs. The number of inputs X is N _X determined by the sum of the number of parameters (N _P ), the number of model states (N _S ), and the number of forcings (N _I ).

일반적으로, 수학식 1의 모델이 N_ED개의 입력-출력 관계 세트(실험 설계 X 및 해당 모델 응답 Y를 포함하는 학습 데이터 세트라고도 함)을 포함하는 경우, 실시예에서 제안하고자 하는 PCK가 모델 M을 대체할 수 있는 대체 모델(M^su)을 구성하도록 할 수 있다. In general, when the model of Equation 1 includes N _ED sets of input-output relationships (also referred to as a training data set including experimental design X and corresponding model response Y), the PCK to be proposed in the embodiment is model M You can configure an alternative model (M ^su ) that can replace .

실시예에 따른 본원발명을 통해 제공하고자 하는 대체 모델을 이하에서는 PCK로 지칭하도록 한다. PCK는 PCE 기법을 이용하여 자료가 가지는 전반적인 경향이나 추세를 모델링하고, 그 추세가 반영하지 못하는 로컬 변형은 정규 크리깅(Ordinary Kriging) 기법으로 보완하는 방법을 모델링한 것이다.An alternative model to be provided through the present invention according to an embodiment is hereinafter referred to as PCK. PCK models the overall tendency or trend of the data using the PCE technique, and models a method of supplementing the local deformation that the trend does not reflect with the Ordinary Kriging technique.

실시예에서, PCK는 아래 수학식 2와 같이 입력 X에 대한 두 개의 항으로 구성될 수 있다.In an embodiment, PCK may consist of two terms for input X as in Equation 2 below.

[수학식 2][Equation 2]

[수학식 3][Equation 3]

수학식 2에서 첫 번째 항 ∑는 PCE와 동일하며 수학식 내에서 추세 함수로 사용될 수 있다.In Equation 2, the first term ∑ is equivalent to PCE and can be used as a trend function in equations.

Ψ_α(X)는 균일, 가우스, 베타 및 감마 분포에 대한 Weiner-Askey 방식을 사용하여 결정된 무작위 입력 변수 X에 대해 직교하는 다변량 다항식이다. a는 다변수 다항식의 구성요소를 식별하는 다중 인덱스이고, ε_α는 Ψ_α(X)에 해당하는 알려지지 않은 PCE 계수이다.Ψ _α (X) is a multivariate polynomial orthogonal to a random input variable X determined using the Weiner-Askey method for uniform, Gaussian, beta and gamma distributions. a is a multiple index identifying the components of the multivariate polynomial, and ε _α is the unknown PCE coefficient corresponding to Ψ _α (X).

수학식 2의 두 번째 항(즉, )에서 σ²는 가우스 프로세스 분산(또는 크리깅 분산)이다. Z(X)는 평균 및 단위 분산이 0인 정상 가우스 과정으로, Z(X)는 두 개의 임의 모델 입력 세트 X와 X' 사이의 자기상관 함수 R(X,X')=R(|X-X'|;δ)에 의해 결정되며, 여기서 δ는 자기상관 함수의 초모수이고 추정할 필요가 있다.The second term of Equation 2 (i.e., ), where σ ² is the Gaussian process variance (or kriging variance). Z(X) is a stationary Gaussian process with zero mean and unit variance, Z(X) is the autocorrelation function between two random model input sets X and X', R(X,X')=R(|X- X′|;δ), where δ is the hyperparameter of the autocorrelation function and needs to be estimated.

수학식 3의 P는 N_X와 같이 다항식 차수(p)에 의존하는 다항식 기저 항의 수(또는 PCE 계수의 수)를 나타낸다.P in Equation 3 represents the number of polynomial basis terms (or the number of PCE coefficients) depending on the polynomial order (p), such as N _X .

실시예에 따른 PCK는 세 가지 프로세스로 구축될 수 있다. 실시예에 따른 방법은 이하 도 1을 통해 설명하도록 한다.PCK according to the embodiment can be built in three processes. The method according to the embodiment will be described with reference to FIG. 1 below.

도 1은 실시예에서, 대체 모델(PCK)를 구축하는 방법을 설명하기 위한 흐름도이다.1 is a flowchart for explaining a method of building a surrogate model (PCK) in an embodiment.

단계(110)에서 장치는, 다항식 차수에 기초하여 상기 대체 모델의 계수를 추정하고, 자기 상관 함수의 매개 변수를 결정한다.In step 110, the device estimates the coefficients of the surrogate model based on the polynomial order and determines the parameters of the autocorrelation function.

먼저, PCE 계수를 추정할 수 있다. PCE 계수 ε_α는 투영, 최소 자승 회귀, 베이지안 압축 감지 및 최소 각도 회귀 등 여러 방법을 사용하여 계산될 수 있다. First, the PCE coefficient can be estimated. The PCE coefficient ε _α can be calculated using several methods including projection, least squares regression, Bayesian compressive sensing and least angle regression.

실시예에서는, 작은 실험 설계로 PCE 계수를 추정하는 방법으로 사용되는 최소 자승 회귀 방법보다 효율적인 최소 각도 회귀 방법을 사용할 수 있고, 추정해야 하는 PCE 계수를 축소하여 차원이 높아질수록 불확실한 입력이 많은 차원의 저주에 관한 문제를 방지할 수 있다.In the embodiment, the minimum angle regression method, which is more efficient than the least squares regression method used as a method of estimating the PCE coefficient with a small experimental design, can be used, and the PCE coefficient to be estimated is reduced so that the higher the dimension, the more uncertain the input. You can avoid problems with curses.

실시예에 따른 PCE의 계수를 추정하는 방법은 수학식 4를 참조할 수 있다.A method of estimating a PCE coefficient according to an embodiment may refer to Equation 4.

[수학식 4][Equation 4]

실시예는 Nx개의 샘플로 구성되는 입력 모델 X와 에 해당하는 출력 모델 Y와 관련하여 추정되는 최적의 PCE 계수를 추정하는 방법에 관한 것이다.An embodiment is an input model X consisting of Nx samples and It relates to a method of estimating an optimal PCE coefficient estimated in relation to an output model Y corresponding to .

여기서, k는 N_x의 인덱스이고, λ는 음이 아닌 상수이며, ε₁은 스페어 솔루션을 추종하도록 최소화를 위한 정규화 항에 해당하며, 로 계산될 수 있다. 수학식 4는 아래의 수학식 5와 같이 나타낼 수 있다.where k is the index of N _x , λ is a non-negative constant, ε ₁ corresponds to the regularization term for minimization to follow the Spare solution, can be calculated as Equation 4 can be expressed as Equation 5 below.

[수학식 5][Equation 5]

여기서, F은 Nx×P의 정보 행렬에 해당하고, 일반 용어는 아래와 같이 나타낼 수 있다.Here, F corresponds to an information matrix of N××P, and the general term can be expressed as follows.

[수학식 6][Equation 6]

두 번째 절차에서 자기상관 함수의 매개변수 δ와 가우스 프로세스 분산 이 결정될 수 있다. 매개변수 δ는 최대 가능도(maximum likelihood) 추정 방법 또는 Leave-one-out 교차 검증을 통해 얻을 수 있다.In the second procedure, the parameter δ of the autocorrelation function and the Gaussian process variance this can be determined. Parameter δ can be obtained through maximum likelihood estimation method or leave-one-out cross-validation.

이 중, Leave-one-out 교차 검증 방법은 아래의 수학식 7과 같이 나타낼 수 있다.Among these, the leave-one-out cross-validation method can be expressed as in Equation 7 below.

[수학식 7][Equation 7]

여기서, 는 최적의 δ를 나타내고, 는 의 x중, 두 샘플 and 의 상환 행렬을 의미한다.here, denotes the optimal δ, Is of x, two samples and means the redemption matrix of

단계(120)에서 장치는, PCE 계수 및 상기 매개변수를 통해 반복 알고리즘의 반복 차수에 따라 다른 가우스 분산을 가지는 복수의 대체 모델들을 구성한다.In step 120, the apparatus constructs a plurality of alternative models having different Gaussian variances according to the iteration order of the iterative algorithm through the PCE coefficients and the above parameters.

실시예에서, x, y, R(X, X'), , ε_αΨ_α(X)가 주어졌을 때, 가우스 분산 σ²를 최적화한 후 반복 알고리즘이 적용될 수 있다. ε_αΨ_α(X)는 크리깅의 추세 부분에 대한 후보로 간주되고, 각 반복 알고리즘의 후보 P의 수는 (a=1, ... , P)로 나타낼 수 있다.In an embodiment, x, y, R(X, X'), , ε _α Ψ _α (X), an iterative algorithm can be applied after optimizing the Gaussian variance σ ² . ε _α Ψ _α (X) is considered as a candidate for the trend part of kriging, and the number of candidates P for each iteration algorithm can be represented by (a=1, ... , P).

실시예에서, 추세 파트의 초기 값(a=1)은 ε₁Ψ₁(X)의 하나의 항으로 나타낼 수 있는 PCK이다. 반복 알고리즘에 의해 반복적으로 다항식은 추세 부분에 하나씩 합산될 수 있으며, 각 반복에서 가우스 분산은 아래 수학식 8과 같이 추정될 수 있다.In an embodiment, the initial value of the trend part (a=1) is PCK, which can be expressed in terms of ε ₁ Ψ ₁ (X). Polynomials can be iteratively added to the trend part one by one by the iterative algorithm, and the Gaussian variance in each iteration can be estimated as shown in Equation 8 below.

[수학식 8][Equation 8]

는 최적 상관 행렬을 나타낸다. F는 수학식 6에 의해 계산되는 정보 행렬이고, 알고리즘의 반복 마다 다른 분산을 가진 대체 모델 PCK_a가 구성될 수 있다. denotes the optimal correlation matrix. F is an information matrix calculated by Equation 6, and an alternative model PCK _a with a different variance can be constructed for each iteration of the algorithm.

단계(130)에서 장치는, 복수의 대체 모델들 중 실제 결과와 편차가 최소인 대체 모델을 선택한다.In step 130, the device selects a surrogate model having the smallest deviation from the actual result from among a plurality of surrogate models.

각 PCK 모델의 P의 수 중에 원형 모델의 결과의 편차가 최소화된 최적이 PCK가 선택될 수 있다. 이러한 편차를 정량적으로 나타내기 위해, 수학식 9와 같이 leave-one-out error()가 사용될 수 있다.Among the number of P of each PCK model, an optimal PCK in which the deviation of the result of the prototype model is minimized may be selected. In order to quantitatively express this deviation, leave-one-out error() may be used as shown in Equation 9.

[수학식 9][Equation 9]

수학식 9에서, PCK_x(-k)는 크기의 실험 설계 를 사용하여 구축된 PCK 모델에 해당한다.In Equation 9, PCK _x(-k) is size experimental design Corresponds to the PCK model built using

실시예에서는 본원발명을 통해 제공되는 대체 모델에 대한 불확실성을 정량화하기 위해 GLUE 기법을 이용할 수 있다. 실시예에서, 문턱 값 기반의 정확도(accuracy-based threshold)와 문턱 값 기반의 효율(efficiency-based threshold) 이 2가지 기준에 대해 설정하고 이에 대한 장단점 및 결과를 제시할 수 있다.In an embodiment, the GLUE technique may be used to quantify uncertainty for an alternative model provided through the present invention. In an embodiment, two criteria of accuracy-based threshold and efficiency-based threshold may be set, and pros and cons and results thereof may be presented.

불확실한 매개변수 θ의 어떤 값이 실제 관찰 결과와 일치할 예측 모델에 적용될 수 있을지 그 가능성을 추론하기 위해 매개변수 추론 방법을 적용할 수 있다. 추론 결과로부터 관찰된 스트림플로우(streamflow)가 주어진 매개변수의 사후 분포를 생성하고, 해당 분포에 대한 가능한 앙상블 결과를 입증할 수 있다. 관측 값 y_obs를 조건으로 하는 매개 변수의 사후 분포는 일반적으로 아래와 같은 Bayes'Rule을 사용하여 표현될 수 있다.The parameter inference method can be applied to infer the probability that the value of the uncertain parameter θ can be applied to the predictive model that will match the actual observation result. From the inference results, the observed streamflow can generate the posterior distribution of the parameters given, and prove possible ensemble outcomes for that distribution. The posterior distribution of a parameter conditional on the observed value y _obs can generally be expressed using the Bayes' Rule as

[수학식 10][Equation 10]

여기서 ρ(θ)은 사전 지식을 기반으로 생성된 θ의 사전 분포이다. 는 주어진 매개변수 세트에서 모델 결과의 조건부 확률을 나타내며, 는 매개변수 θ의 사후 분포를 의미한다.where ρ(θ) is the prior distribution of θ generated based on prior knowledge. denotes the conditional probability of a model outcome given a set of parameters, is the posterior distribution of the parameter θ.

실시예에서, GLUE 기법을 실시예에 적용할 수 있다. 이는 구현하기에 간단하고, 비공식 가능성 함수 및 해당 차단 임계값을 유연하게 선택할 수 있다. 해당 기법은 과도한 조정을 피하고 예측 성능을 보장하지 못하는 일부 모델 매개변수 세트를 제외할 수 있다.In an embodiment, the GLUE technique may be applied to the embodiment. It is simple to implement and allows for a flexible choice of informal probability function and corresponding cut-off threshold. The technique avoids over-tuning and can exclude some model parameter sets that do not guarantee predictive performance.

GLUE의 컨텍스트에서 다양한 가능성 함수가 적용될 수 있다. 실시예에서, 홍수 수문 곡선의 모양, 첨두 및 부피에 대한 편차를 동시에 특성화하기 위해 Nash-Sutcliffe 효율(NSE), 최고점 오차(PE) 및 부피 오차(VE)의 조합이 아래와 같은 수학식 11의 가능도 함수(L)로 표시될 수 있다.In the context of GLUE, various likelihood functions can be applied. In an embodiment, a combination of the Nash-Sutcliffe efficiency (NSE), the peak error (PE), and the volume error (VE) can be used to simultaneously characterize the deviation of the shape, peak, and volume of the flood hydrograph. can also be expressed as a function (L).

[수학식 11][Equation 11]

여기서 y_t ^Obs 및 y_t은 각각 시간 t에서 관찰 및 시뮬레이션된 스트림플로우이다. T는 홍수 이벤트에 대한 총 시간 단계 수이고, y_max ^Obs 및 y_max은 각각 첨두에서 관측 및 시뮬레이션된 하천 흐름이다. 및 는 각각 관측 및 모의 수문 곡선의 총 부피를 나타낸다.where y _t ^Obs and y _t are the observed and simulated stream flow at time t, respectively. T is the total number of time steps for the flood event, and y _max ^Obs and y _max are the observed and simulated river flow at the peak, respectively. and denote the total volume of the observed and simulated hydrographs, respectively.

수학식 11의 세 개의 괄호 안의 부분 방정식은 각각 상호보완적인 NSE(1-NSE), PE 및 VE를 나타낸다. 수학식 11은 0 내지 1 사이의 값을 가지고, 값이 최소에 가까울수록 오류가 줄어드는 것을 의미할 수 있다. 실시예에서 컷오프 임계값은 가능도 함수의 허용 편차(이하, 정확도 기반 임계값으로 표현함) 또는 총 시뮬레이션의 수 고정 비율(이하, 효율 기반 임계값으로 표현함)으로 지정할 수 있다. 수행된 전체 시뮬레이션은 수학식 10의 값을 추적하는 데에 활용되는 이 임계값 조건을 충족하거나 충족하지 않는 실행으로 나뉠 수 있다.The partial equations in the three parentheses of Equation 11 represent complementary NSE (1-NSE), PE and VE, respectively. Equation 11 has a value between 0 and 1, and the closer the value is to the minimum, the smaller the error. In an embodiment, the cutoff threshold may be designated as an allowable deviation of the likelihood function (hereinafter, expressed as an accuracy-based threshold) or a fixed ratio of the total number of simulations (hereinafter, expressed as an efficiency-based threshold). The overall simulation performed can be divided into runs that meet or do not meet this threshold condition, which is utilized to track the value of Equation 10.

실시예에서, 대체 모델 기반의 불확실성을 정량화하기 위한 프레임워크를 제안할 수 있다.In embodiments, a framework for quantifying uncertainty based on alternative models may be proposed.

도 2는 실시예에서 대체 모델을 구성하고, 대체 모델과 GLUE를 결합하여 효율적인 방식으로 모델 매개변수를 추론하는 방법을 도시하고 있다.Figure 2 shows how to construct a surrogate model in an embodiment, and combine the surrogate model and GLUE to infer model parameters in an efficient manner.

단계(201)는 실험 설계 x와 모델 응답 y의 모음을 획득하는 방법에 대한 것이다.Step 201 relates to a method of obtaining a collection of experimental designs x and model responses y.

일반적으로 단계(201)과 같이, 수문 모델의 입력 및 출력 집합(또는 실험 설계 및 모델 응답)을 사용하여 대체 모델(202)를 구성할 수 있다. 대체 모델(202)은 수문학적 모델(203)의 불확실한 매개변수에 대한 역추론의 빠른 계산이 가능하도록 할 수 있다.In general, as in step 201, the set of inputs and outputs of the hydrologic model (or design of experiments and model responses) can be used to construct a surrogate model 202. The surrogate model 202 can enable fast computation of inferences for uncertain parameters of the hydrological model 203.

단계(202)는 대체 모델을 구성하는 방법에 대해서 설명한다. 해당 방법은 도 1에 대한 설명을 참조할 수 있다.Step 202 describes how to construct a surrogate model. The method may refer to the description of FIG. 1 .

단계(203)는, 추론 단계에 대한 것으로, GLUE를 수행하기 위해 단계(202)에서 생성된 대체 모델을 사용하는 방법을 설명한다. 실시예는 가능도 함수, 두 가지 유형의 임계값(정확도 기반 임계값 또는 효율 기반 임계값) 및 관찰된 스트림플로우에 기초하여 앙상블 스트림플로우와 사후 매개변수를 획득할 수 있다.Step 203 is for the inference step and describes how to use the surrogate model created in step 202 to perform the GLUE. Embodiments may obtain ensemble streamflows and posterior parameters based on likelihood functions, two types of thresholds (accuracy-based thresholds or efficiency-based thresholds) and observed streamflows.

구체적으로, 단계(201)에 의하면, 실험 설계 x는 θ, x 및 u의 Nx 세트로 구성되며, 여기서 θ 값은 라틴 하이퍼큐브 샘플링(LHS)을 사용하여 균일한(사전) 분포 ρ(θ)를 통해 무작위로 선택될 수 있다. 상태 x는 0 벡터로 초기화되고 물리력 u 값은 과거의 기후 데이터(예컨대, 강우량 등)를 사용하여 수집될 수 있다. 해당 응답 y 값은 수문 모델 M에 x를 적용하여 획득할 수 있다.Specifically, according to step 201, the experimental design x consists of Nx sets of θ, x and u, where the θ values are uniform (prior) distributed ρ(θ) using Latin Hypercube sampling (LHS). can be selected randomly through State x is initialized with a 0 vector and the value of physical force u can be collected using past climate data (eg rainfall, etc.). The corresponding response y value can be obtained by applying x to the hydrologic model M.

단계(202)에 의하면, 주어진 x와 y에 대한 대체 모델을 구성하는 절차를 도시하고 있다.Step 202 shows the procedure for constructing a surrogate model for a given x and y.

먼저, PCE 계수(ε_α)는 x, y 및 다항식 차수(p)가 주어진 수학식 4에 의해 추정될 수 있고, 자기상관 함수 R(X, X′)의 매개변수 δ는 수학식 7과 같이 최적화될 수 있다. PCE 계수와 매개변수가 모두 결정되면, 반복 알고리즘에 의해 가우스 분산이 최적화될 수 있고, 대체 모델을 구성하기 시작하며, 이는 반복 횟수 P까지 계속될 수 있다. 구성된 대체 모델들 중, 수학식 9에 의해 계산된, 오차가 가장 적은 대체 모델을 최적의 대체 모델로 선택할 수 있다.First, the PCE coefficient (ε _α ) can be estimated by Equation 4 given x, y and the polynomial order (p), and the parameter δ of the autocorrelation function R (X, X′) is as shown in Equation 7 can be optimized. Once the PCE coefficients and parameters are all determined, the Gaussian variance can be optimized by an iterative algorithm, and starting to construct an alternative model, which can continue up to the number of iterations P. Among the configured surrogate models, a surrogate model with the smallest error calculated by Equation 9 may be selected as an optimal surrogate model.

대체 모델이 구성되면, 계산 비용이 저렴한 매개변수 추론에 사용될 수 있다. 단계(203)에 의하면, 매개변수의 사후 분포와 시뮬레이션된 스트림플로우의 불확실한 간격을 제공할 수 있다. 이 중, 스트림플로우의 예측은 관찰된 스트림 플로우 와 일치하고 허용 임계값을 충족할 수 있다. 실시예에서는 허용 정확도 임계 값 및 허용 효율 임계값의 두 가지 유형의 임계값을 통일할 수 있다. GLUE에서 θ 및 x는 단계(201)와 유사하게 초기화되는 반면 물리력 u 값은 가능한 극단적인 이벤트를 포함하는 잠재적인 미래 이벤트에서 사용될 수 있다.Once a surrogate model is constructed, it can be used for computationally inexpensive parameter inference. Step 203 provides the posterior distribution of the parameters and the uncertain interval of the simulated streamflow. Among them, the prediction of the stream flow can match the observed stream flow and meet the acceptance threshold. In the embodiment, two types of threshold values, the acceptable accuracy threshold and the acceptable efficiency threshold, may be unified. In GLUE, θ and x are initialized similarly to step 201 while the force u value can be used in potential future events including possible extreme events.

앞서 언급된 세 가지 대체 모델(PCE, OK 및 PCK)이 학습 과정에서 실험 설계에 대한 원형 모델을 얼마나 정확하게 모방하는지 조사하기 위해 수학식 9의 일회성 오류()를 사용할 수 있다. 가 작을수록 에뮬레이터가 더 정확함을 나타낸다.In order to investigate how accurately the aforementioned three surrogate models (PCE, OK and PCK) mimic the prototypical model for design of experiments during learning, the one-time error in Equation 9 ( ) can be used. A smaller value indicates that the emulator is more accurate.

앙상블 예측을 위해서는 결정론적, 확률적 측정을 통한 평가가 필요하다. GLUE에서 채택한 가능도 함수 L도 결정론적 척도로 활용될 수 있다. 연속 순위 확률 점수(CRPS)와 스프레드(Spread)가 확률적 측정을 위해 선택될 수 있다. CRPS는 단일 시간 단계에서 앙상블 예측과 관찰의 분포 사이의 근접성을 측정할 수 있다. 실시예에서 수학식 12의 홍수 사건에 대한 CRPS의 평균()이 중요하며, 이상적인 값은 0이다. 두 번째 확률 메트릭인 는 관찰값과 비교하여 앙상블 예측의 분산을 나타낼 수 있다. 값이 작을수록 더 신뢰할 수 있는 예측을 나타내며. Spread는 수학식 13에 공식화된 평가 기간 동안의 평균 앙상블 분산의 제곱근과 같다.Ensemble prediction requires evaluation through deterministic and probabilistic measurements. The likelihood function L adopted by GLUE can also be utilized as a deterministic measure. Continuous Rank Probability Score (CRPS) and Spread can be selected for probabilistic measures. CRPS can measure the closeness between the distribution of ensemble predictions and observations in a single time step. In the embodiment, the average of CRPS for flood events in Equation 12 ( ) is important, and the ideal value is zero. The second probability metric can represent the variance of the ensemble prediction compared to the observed value. Smaller values indicate more reliable predictions. Spread equals the square root of the average ensemble variance over the evaluation period formulated in Equation 13.

[수학식 12][Equation 12]

[수학식 13][Equation 13]

여기서 F(y) 및 F(y^Obs)는 각각 시간 t에서 앙상블 스트림플로우 예측 및 관찰의 누적 분포이고, l는 NE 크기의 모델 예측의 앙상블 멤버에 대한 인덱스이다.where F(y) and F(y ^Obs ) are the cumulative distributions of the ensemble streamflow predictions and observations at time t, respectively, and l is the index for the ensemble member of the NE-sized model prediction.

상기의 메트릭 외에도 원형 모델과 비교하여 대체 모델의 정확도와 효율성에 가중치를 부여하여 대체 모델(M^su)의 전체 성능을 평가할 수 있는 성능 점수(PS)가 제안될 수 있다.In addition to the above metrics, a performance score (PS) can be proposed to evaluate the overall performance of the surrogate model (M ^su ) by weighting the accuracy and efficiency of the surrogate model compared to the original model.

[수학식 14][Equation 14]

[수학식 15][Equation 15]

[수학식 16][Equation 16]

여기서 L(M^su) 및 L(M)은 각각 대체 및 원본 모델의 동작 실행에 대한 가능도 함수 값을 나타낸다. GLUE(M^su) 및 GLUE(M)은 GLUE에 의한 대체 및 원본 모델의 동작 실행을 달성하기 위한 총 런타임을 의미한다. RT(M^su) 및 RT(M)은 각각 대체 및 원본 모델에 대한 단일 실행에 필요한 런타임을 나타내며, Nruns(M^su) 및 Nruns(M)은 각각 컷오프 임계값 조건을 충족하는 사전 정의된 수의 행동 실행을 획득하는 데 필요한 이전 실행 수를 나타낸다.where L(M ^su ) and L(M) denote the likelihood function values for motion executions of the replacement and original models, respectively. GLUE(M ^su ) and GLUE(M) refer to the total runtime to achieve replacement and operation execution of the original model by GLUE. RT(M ^su ) and RT(M) represent the runtimes required for a single run for the replacement and original models, respectively, and Nruns(M ^su ) and Nruns(M), respectively, a predefined number of times that meet the cutoff threshold condition. Indicates the number of previous runs required to acquire an action run.

수학식 14에서 는 정확도 성능을 나타내기 위한 것이고, 이는 수학식 15를 참조하면, 두 앙상블 세트 L(M^su) 및 L(M) 간의 유클리드 거리로 정의될 수 있다. 는 대체 모델이 원본 모델과 동일한 정확도를 가지며 원본 모델을 효과적으로 대체함을 나타낼 수 있다. 는 효율성 성능을 나타내며, 불확실성 정량화에 필요한 총 런타임의 상대적 차이로 정의될 수 있다. 해당 값이 작을수록 대체 모델의 효율이 좋음을 의미한다.in Equation 14 Is to represent the accuracy performance, which can be defined as the Euclidean distance between two ensemble sets L(M ^su ) and L(M), referring to Equation 15. may indicate that the replacement model has the same accuracy as the original model and effectively replaces the original model. represents the efficiency performance and can be defined as the relative difference in the total runtime required to quantify the uncertainty. The smaller the corresponding value, the better the efficiency of the alternative model.

수학식 14의 두 항을 결합한 PS의 가능한 값의 범위는 0에서 무한대이다. PS가 0에 가까울 때 선택한 대체 모델이 원형 모델과 유사한 정확도를 가지며 매우 짧은 시간에 불확실성 정량화가 완료됨을 의미하고, 반면 PS가 무한대에 접근하면 대체 모델의 정확도와 효율성이 모두 매우 낮음을 나타낼 수 있다.The range of possible values of PS combining the two terms of Equation 14 is from 0 to infinity. When PS is close to 0, it means that the selected surrogate model has similar accuracy to the original model and the uncertainty quantification is completed in a very short time, whereas when PS approaches infinity, it can indicate that both the accuracy and efficiency of the surrogate model are very low. .

도 3은 실시예에서, 대체 모델의 오차와 관련된 성능을 평가한 도면이다.3 is a diagram illustrating performance related to error of an alternative model in an embodiment.

실시예는 적은 수의 자료로 학습시킨 경우 대체 모델에 따른 오차 비교 결과를 도시하고 있다. PCK 대체 모델은 원형 모델의 거동을 정확하게 재현하지만, PCE와 OK 대체 모델은 상대적으로 낮은 성능을 보인다. 예컨대, 리브 원 아웃 에러 가 약 2~7배 차이나는 것을 확인할 수 있다.The embodiment shows the error comparison result according to the alternative model when learning with a small number of data. The PCK surrogate model accurately reproduces the behavior of the original model, but the PCE and OK surrogate models show relatively low performance. For example, it can be confirmed that the leave one out error is about 2 to 7 times different.

도 4는 실시예에서, 실제 발생한 이벤트들에 대해서 원형 모델 및 대체 모델들을 적용한 결과를 도시한 그래프이다.4 is a graph illustrating results of applying a prototype model and replacement models to actual events in an embodiment.

예시적으로 적용하는 event 1 내지 event 8의 이벤트에 대해서 도 4(a)는 원형 모델(NAM)의 예측 결과를 도시하고 있으며, 도 4(b)는 PCE 모델, 도 4(c)는 OK 모델, 그리고 도 4(d)는 실시예의 PCK 모델의 예측 결과를 도시하고 있다.4 (a) shows the prediction results of the NAM model for events 1 to 8 that are exemplarily applied, FIG. 4 (b) is the PCE model, and FIG. 4 (c) is the OK model. , and FIG. 4(d) shows the prediction result of the PCK model of the embodiment.

실시예는 0.1의 정확도 기반의 임계값을 사용하여 GLUE 기법 기반으로, 8개의 이벤트에 대해서 각 모델을 통해 획득한 일정 수의 앙상블 요소의 95%의 신뢰 대역에서 관찰 및 예측된 스트림플로우를 도시하고 있다.The embodiment shows the observed and predicted streamflow in the 95% confidence band of a certain number of ensemble elements obtained through each model for 8 events based on the GLUE technique using an accuracy-based threshold of 0.1 there is.

도시된 바와 같이, 극한 이벤트일수록 실시예를 통해 제안된 PCK 모델의 예측 결과는 타 모델에 대비하여 정확도가 높음을 확인할 수 있다.As shown, it can be confirmed that the prediction result of the PCK model proposed through the embodiment is more accurate than other models for an extreme event.

도 5는 실시예에서, 허용 임계값이 원형 모델 및 대체 모델의 성능에 미치는 영향을 도시한 도면이다.5 is a diagram illustrating the effect of acceptance thresholds on the performance of the original model and the surrogate model in an embodiment.

실시예에서, 두 가지 유형의 허용 임계값(정확도 기반 임계값 또는 효율 기반 임계값)이 도4와 관련된 8가지 이벤트에 대해서 원형 모델과 3가지 유형의 대체 모델 각각의 성능으로 정확도와 효율에 미치는 영향을 도시하고 있다.In an embodiment, two types of acceptance thresholds (accuracy-based thresholds or efficiency-based thresholds) affect accuracy and efficiency with the performance of the prototype model and each of the three types of surrogate models for the eight events associated with FIG. shows the influence.

도 5(a) 및 도 5(c)의 음영 영역은 각 모델에 대해서 임계값에 해당하는 L의 값 2000개의 95% 신뢰 대역을 나타내며, 도 5(b) 및 도 5(d)는 각 모델에 대해서 GLUE 기법을 구현하는 데에 필요한 총 모델의 실행 수를 나타낸다.The shaded area in FIGS. 5(a) and 5(c) represents the 95% confidence band of 2000 values of L corresponding to the threshold for each model, and FIGS. 5(b) and 5(d) show each model. Indicates the total number of model executions required to implement the GLUE technique for .

GLUE 기법을 사용하여 불확실성 정량화을 수행하는 데 있어, 정확도 기반 임계값과 효율 기반 임계값, 이 두 가지 기준에 대한 장단점 및 결과를 제시할 수 있다. In performing uncertainty quantification using the GLUE technique, the pros and cons and results of these two criteria, accuracy-based threshold and efficiency-based threshold, can be presented.

실시예에서, 원형 모델 NAM과 대체모형 PCK는 정확도 기준을 낮추면 보다 정확한 결과를 얻을 수 있으나 대체 모델 PCE 와 OK는 정확도 기준을 낮추면 일정 수준 이상의 정확도를 얻을 수 없다.In the embodiment, the prototype model NAM and the replacement model PCK can obtain more accurate results by lowering the accuracy standard, but the replacement model PCE and OK cannot obtain accuracy higher than a certain level if the accuracy standard is lowered.

실시예에서, 효율성을 기준으로 불확실성을 정량화하는 경우, 원형 모델 NAM과 대체 모델 PCK의 정확도가 대체 모델 PCE 및 OK의 정확도보다 더 뛰어난 것을 알 수 있다.In the examples, when uncertainty is quantified based on efficiency, it can be seen that the accuracy of the original model NAM and the surrogate model PCK is better than that of the surrogate model PCE and OK.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. The device can be commanded. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

A method for generating a surrogate model in an apparatus for generating a surrogate model predicting an extrapolation phenomenon,
estimating polynomial chaos expansion (PCE) coefficients based on polynomial order and determining parameters of an autocorrelation function;
constructing a plurality of alternative models having different Gaussian variances according to an iteration order of an iterative algorithm through the PCE coefficient and the parameter; and
Selecting an alternative model having a minimum deviation from an actual result from among the plurality of alternative models
including,
How to create an alternative model.

According to claim 1,
The step of constructing the plurality of substitute models,
Deriving a variance by a trend function corresponding to the PCE coefficient and the parameter and the autocorrelation function according to a corresponding polynomial order
including,
How to create an alternative model.

According to claim 2,
The plurality of alternative models,
A form in which a new term is added to the trend function for each iteration order by the iterative algorithm,
How to create an alternative model.

According to claim 1,
The step of selecting the alternative model,
Calculating a leave-one-out error for each of the plurality of substitute models
including,
How to create an alternative model.

According to claim 1,
The step of selecting the alternative model,
Selecting an alternative model corresponding to the order with the minimum deviation from the actual result
including,
How to create an alternative model.

According to claim 1,
quantifying uncertainty for the selected surrogate model based on a figure of merit;
Including more,
How to create an alternative model.

According to claim 6,
The performance index is,
Accuracy performance represented by the Euclidean distance between the selected surrogate model and the ensemble set of the original model and a weight given to each efficiency performance defined based on the relative runtimes of the selected surrogate model and the original model.
How to create an alternative model.

According to claim 1,
The parameter can be obtained through a maximum likelihood estimation method or a leave-one-out cross-validation,
How to create an alternative model.

According to claim 1,
The PCE coefficient ε _α is calculated by at least one method of least squares regression, Bayesian compressive sensing, and least angular regression.
How to create an alternative model.

A computer program stored in a computer readable medium to be combined with hardware to execute the method of any one of claims 1 to 9.

An apparatus for generating an alternative model for predicting an extrapolation phenomenon,
one or more processors;
Memory; and
one or more programs stored in the memory and configured to be executed by the one or more processors;
said program,
estimating polynomial chaos expansion (PCE) coefficients based on polynomial order and determining parameters of an autocorrelation function;
constructing a plurality of alternative models having different Gaussian variances according to an iteration order of an iterative algorithm through the PCE coefficient and the parameters; and
Selecting an alternative model having a minimum deviation from an actual result from among the plurality of alternative models
including,
Device.

According to claim 11,
The step of constructing the plurality of substitute models,
Deriving a variance by a trend function corresponding to the PCE coefficient and the parameter and the autocorrelation function according to a corresponding polynomial order
including,
Device.

According to claim 12,
The plurality of alternative models,
A form in which a new term is added to the trend function for each iteration order by the iterative algorithm,
Device.

According to claim 11,
The step of selecting the alternative model,
Calculating a leave-one-out error for each of the plurality of substitute models
including,
Device.

According to claim 11,
The step of selecting the alternative model,
Selecting an alternative model corresponding to the order with the minimum deviation from the actual result
including,
Device.

According to claim 11,
quantifying uncertainty for the selected surrogate model based on a figure of merit;
Including more,
Device.

According to claim 16,
The performance index is,
Defined based on the Euclidean distance between the selected surrogate model and the ensemble set of the original model and the relative runtime of the selected surrogate model and the original model,
Device.

According to claim 11,
The parameter can be obtained through a maximum likelihood estimation method or a leave-one-out cross-validation,
Device.

According to claim 11,
The PCE coefficient ε _α is calculated by at least one method of least squares regression, Bayesian compressive sensing, and least angular regression.
Device.