KR102075370B1

KR102075370B1 - Apparatus and method for processing time series data

Info

Publication number: KR102075370B1
Application number: KR1020180052896A
Authority: KR
Inventors: 이태삼
Original assignee: 경상대학교산학협력단
Priority date: 2018-05-09
Filing date: 2018-05-09
Publication date: 2020-02-10
Also published as: KR20190128783A

Abstract

시계열 데이처 처리 장치 및 방법이 개시된다. 본 발명의 일실시예에 따른 시계열 데이터 처리 장치는 모델 데이터를 입력 받는 모델 입력부; 상기 모델 데이터를 평균-표준편차, 확률밀도함수 추정치 및 경험 누적분포 함수 중 적어도 하나를 이용하여 복수개의 시계열 데이터로 산출하는 시계열 처리부 및 상기 복수개의 시계열 데이터에서 단일 시계열 데이터를 선택하는 시계열 선택부를 포함한다.Disclosed are a time series data processing apparatus and method. An apparatus for processing time series data according to an embodiment of the present invention includes a model input unit for receiving model data; A time series processor configured to calculate the model data as a plurality of time series data using at least one of an average-standard deviation, a probability density function estimate, and an empirical cumulative distribution function; and a time series selector that selects a single time series data from the plurality of time series data. do.

Description

Apparatus and method for processing time series data {APPARATUS AND METHOD FOR PROCESSING TIME SERIES DATA}

본 발명은 데이터 처리 기술에 관한 것으로, 보다 상세하게는 기후 변화에 대한 기후 모델의 시계열 데이터를 분석 및 처리하는 기술에 관한 것이다.The present invention relates to data processing techniques, and more particularly, to a technique for analyzing and processing time series data of climate models for climate change.

기후 모델의 확률적 시뮬레이션은 수자원 관리, 수문 설계 및 많은 기후 관련 응용 분야에 사용되어 왔다. 최근 몇 년 동안, 기후 상세화 모델은 기후 변화와 영향을 평가하기 위해 기후 변화를 공간적, 시간적으로 상세화시키는데 널리 쓰이고 있다. 특히, 시간적인 시계열 데이터가 미치는 기후 변화 영향에 대한 관심이 많아지고 있다. Stochastic simulations of climate models have been used in water resource management, hydrology design, and many climate-related applications. In recent years, climate specification models have been widely used to refine spatial and temporal changes in climate change to assess climate change and impacts. In particular, there is a growing interest in the impact of climate change on time series data.

기후 모델을 이용한 기후 변화 연구를 통하여 다양한 시나리오가 생산될 수 있다. 특히, 기후 모델을 통해 나온 자료를 바탕으로 추계학적 상세화를 실시할 경우 다양한 시계열 데이터가 생성된다. 그러나, 기후 변화를 고려한 정확한 기후 예측을 위해서는 단일 시계열 자료가 필요하다.Various scenarios can be produced through climate change studies using climate models. In particular, various time series data are generated when stochastic detailing is based on data from climate models. However, a single time series of data is needed for accurate climate forecasts that take into account climate change.

한편, 한국등록특허 제 10-1486798호“기후정보 편의 보정을 위한 단계적 스케일링 방법”는 기후 모의 과정에서 발생하는 편의(偏倚)를 극치호우사상 등 실제 기후특성을 반영하여 보정할 수 있는 기후정보 편의 보정을 위한 단계적 스케일링 방법에 관하여 개시하고 있다.On the other hand, Korean Patent Registration No. 10-1486798, "Stepwise Scaling Method for Climate Information Convenience Correction", is a convenience of climate information that can correct the bias occurring during the climate simulation process by reflecting actual climate characteristics such as extreme rainfall events. A stepwise scaling method for correction is disclosed.

본 발명은 기후 모델의 다양한 시계열 데이터 중 전체 시계열을 대표하는 단일 시계열을 선택하여 현장에서의 설계 분석을 용이하게 함을 목적으로 한다.An object of the present invention is to facilitate a design analysis in the field by selecting a single time series representing the entire time series among various time series data of the climate model.

또한, 본 발명은 시계열 데이터 분석을 통해 정확한 기후 예측 및 분석을 제공하는 것을 목적으로 한다.It is also an object of the present invention to provide accurate climate prediction and analysis through time series data analysis.

상기한 목적을 달성하기 위한 본 발명의 일실시예에 따른 시계열 데이터 처리 장치는 기후 모델 데이터를 입력 받는 모델 입력부; 상기 기후 모델 데이터를 평균-표준편차, 확률밀도함수 추정치 및 경험 누적분포 함수 중 적어도 하나를 이용하여 복수개의 시계열 데이터로 산출하는 시계열 처리부 및 상기 복수개의 시계열 데이터의 중앙값(median)을 산출하여 단일 시계열 데이터를 선택하는 시계열 선택부를 포함한다.Time series data processing apparatus according to an embodiment of the present invention for achieving the above object comprises a model input unit for receiving climate model data; A time series processing unit for calculating the climate model data as a plurality of time series data using at least one of an average-standard deviation, a probability density function estimate, and an empirical cumulative distribution function, and a median of the plurality of time series data. And a time series selector for selecting data.

이 때, 상기 시계열 선택부는 상기 복수개의 시계열 데이터에서 상기 중앙값에 가장 근접한 하나의 시계열 데이터를 상기 단일 시계열 데이터로 선택할 수 있다.In this case, the time series selector may select one time series data closest to the median value from the plurality of time series data as the single time series data.

이 때, 상기 시계열 선택부는 상기 평균-표준편차, 상기 확률밀도함수 추정치 및 상기 경험 누적분포 함수를 이용하여 산출된 세 종류의 복수개의 시계열 데이터에 대한 세 개의 중앙값들을 산출할 수 있다.In this case, the time series selector may calculate three median values for the three types of time series data calculated using the mean-standard deviation, the probability density function estimate, and the empirical cumulative distribution function.

이 때, 상기 시계열 선택부는 상기 세 개의 중앙값들 각각에 가장 근접한 세 개의 단일 시계열 데이터를 선택할 수 있다.In this case, the time series selector may select three single time series data closest to each of the three median values.

이 때, 상기 시계열 선택부는 상기 세 개의 중앙값들에 상기 세 개의 단일 시계열 데이터가 근접한 정도를 비교하여, 가장 중앙값에 근접한 단일 시계열 데이터를 최종 단일 시계열 데이터로 선택할 수 있다.In this case, the time series selector may compare the degree of proximity of the three single time series data to the three median values, and select single time series data closest to the median value as the final single time series data.

이 때, 상기 시계열 선택부는 확률밀도함수 추정치 및 경험 누적분포 함수를 이용하여 산출된 단일 시계열 데이터를 상기 기후 모델 데이터의 단일 시계열 데이터 형식으로 변환하고, 상기 평균-표준편차를 이용하여 산출된 단일 시계열 데이터와의 중앙값을 산출하고, 상기 중앙값에 가장 근접한 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, the time series selector converts the single time series data calculated using the probability density function estimate and the empirical cumulative distribution function into a single time series data format of the climate model data, and calculates the single time series calculated using the mean-standard deviation. A median with the data can be calculated and single time series data closest to the median can be selected as the final time series data.

이 때, 상기 시계열 선택부는 상기 세 개의 단일 시계열 데이터에 대해서, 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN) 중 적어도 하나 이상의 주요 통계값을 기준으로 하는 박스 플롯을 생성할 수 있다.In this case, the time series selector includes at least one of average (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX), and minimum (MIN) for the three single time series data. A box plot can be generated based on the above main statistics.

이 때, 상기 시계열 선택부는 박스 플롯의 최대값 및 최소값이 중앙값에 가장 가까운 박스 플롯을 선택하고, 상기 세 개의 단일 시계열 데이터의 박스 플롯들 중 가장 많은 박스 플롯이 선택된 어느 하나의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, the time series selector selects a box plot in which the maximum and minimum values of the box plots are closest to the median value, and finalizes any single time series data in which the most box plots of the three single time series data are selected. Can be selected as time series data.

이 때, 상기 시계열 선택부는 가장 많은 박스 플롯이 선택된 두 개의 단일 시계열 데이터를 선택하고, 상기 두 개의 단일 시계열 데이터의 박스플롯의 최대값(MAXIMUM)과 이상치(OUTLIER)를 고려하여 어느 하나의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, the time series selector selects two single time series data having the most box plots selected, and considers one single time series in consideration of the maximum value MAXIMUM and the OUTLIER of the box plots of the two single time series data. You can select the data as the final time series data.

또한, 상기한 목적을 달성하기 위한 본 발명의 일실시예에 따른 시계열 데이터 처리 방법은 시계열 데이터 처리 장치의 시계열 데이터 처리 방법에 있어서, 기후 모델 데이터를 입력 받는 단계; 상기 기후 모델 데이터를 평균-표준편차, 확률밀도함수 추정치 및 경험 누적분포 함수 중 적어도 하나를 이용하여 복수개의 시계열 데이터로 산출하는 단계 및 상기 복수개의 시계열 데이터의 중앙값(median)을 산출하여 단일 시계열 데이터를 선택하는 단계를 포함한다.In addition, the time series data processing method according to an embodiment of the present invention for achieving the above object, in the time series data processing method of the time series data processing apparatus, comprising: receiving climate model data; Calculating the climate model data as a plurality of time series data using at least one of an average-standard deviation, a probability density function estimate, and an empirical cumulative distribution function; and calculating a median of the plurality of time series data to calculate a single time series data. Selecting a step.

본 발명은 기후 모델의 다양한 시계열 데이터 중 전체 시계열을 대표하는 단일 시계열을 선택하여 현장 분석을 용이하게 할 수 있다.The present invention can facilitate on-site analysis by selecting a single time series representing the entire time series among various time series data of the climate model.

또한, 본 발명은 시계열 데이터 분석을 통해 정확한 기후 예측 및 분석을 제공할 수 있다.In addition, the present invention may provide accurate climate prediction and analysis through time series data analysis.

도 1은 본 발명의 일실시예에 따른 시계열 데이터 처리 장치를 나타낸 블록도이다.
도 2는 본 발명의 일실시예에 따른 시계열 데이터 처리 방법을 나타낸 동작 흐름도이다.
도 3은 본 발명의 일실시예에 따른 평균-표준편차(MS)를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.
도 4는 본 발명의 일실시예에 따른 확률밀도함수(DENS) 추정치를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.
도 5는 본 발명의 일실시예에 따른 경험 누적분포함수(ECDF)를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.
도 6은 본 발명의 일실시예에 따른 3가지 통계 기법에 대한 기후 모델의 GEV 분포를 나타낸 그래프이다
도 7은 본 발명의 일실시예에 따른 3가지 통계 기법에 대한 주요 통계값들의 박스 플롯(Box Plot)을 나타낸 도면이다.
도 8은 본 발명의 일실시예에 따른 평균-표준편차를 이용하여 산출된 단일 시계열(MS)과 경험 누적분포함수(ECDF)를 이용하여 선택된 단일 시계열 데이터에 대한 주요 통계값들의 박스 플롯을 62개 지역에 대해 나타낸 그래프이다.
도 9는 발명의 일실시예에 따른 평균-표준편차와 경험 누적분포함수에 대한 주요 통계값들의 박스 플롯을 나타낸 도면이다.
도 10은 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.1 is a block diagram illustrating an apparatus for processing time series data according to an embodiment of the present invention.
2 is an operation flowchart illustrating a time series data processing method according to an embodiment of the present invention.
3 is a graph showing time series data of a climate model using a mean-standard deviation (MS) according to an embodiment of the present invention.
4 is a graph showing time series data of a climate model using a probability density function (DENS) estimate according to an embodiment of the present invention.
5 is a graph showing time series data of a climate model using an empirical cumulative distribution function (ECDF) according to an embodiment of the present invention.
6 is a graph showing the GEV distribution of the climate model for three statistical techniques according to an embodiment of the present invention.
FIG. 7 illustrates a box plot of key statistical values for three statistical techniques according to an embodiment of the present invention.
FIG. 8 shows a box plot of key statistics for a single time series data selected using a single time series (MS) and an empirical cumulative distribution function (ECDF) calculated using the mean-standard deviation according to an embodiment of the present invention. This is a graph of the regions.
FIG. 9 is a box plot of key statistical values for mean-standard deviation and empirical cumulative distribution function according to one embodiment of the invention. FIG.
10 illustrates a computer system according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. Here, repeated descriptions, well-known functions and configurations that may unnecessarily obscure the subject matter of the present invention, and detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more fully describe the present invention to those skilled in the art. Accordingly, the shape and size of elements in the drawings may be exaggerated for clarity.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, except to exclude other components unless specifically stated otherwise.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 시계열 데이터 처리 장치를 나타낸 블록도이다.1 is a block diagram illustrating an apparatus for processing time series data according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 시계열 데이터 처리 장치는 모델 입력부(110), 시계열 처리부(120) 및 시계열 선택부(130)를 포함한다.Referring to FIG. 1, an apparatus for processing time series according to an embodiment of the present invention includes a model input unit 110, a time series processing unit 120, and a time series selecting unit 130.

모델 입력부(110)는 기후 모델 데이터를 입력 받을 수 있다.The model input unit 110 may receive the climate model data.

시계열 처리부(120)는 기후 모델 데이터를 평균-표준편차, 확률밀도함수 추정치 및 경험 누적분포 함수 중 적어도 하나를 이용하여 복수개의 시계열 데이터로 산출할 수 있다.The time series processor 120 may calculate the climate model data as a plurality of time series data using at least one of an average-standard deviation, a probability density function estimate, and an empirical cumulative distribution function.

시계열 선택부(130)는 복수개의 시계열 데이터의 중앙값(median)을 산출하여 단일 시계열 데이터를 선택할 수 있다.The time series selector 130 may select a single time series data by calculating a median of the plurality of time series data.

이 때, 시계열 선택부(130)는 상기 복수개의 시계열 데이터에서 상기 중앙값에 가장 근접한 시계열 데이터를 상기 단일 시계열 데이터로 선택할 수 있다.In this case, the time series selector 130 may select time series data closest to the median from the plurality of time series data as the single time series data.

이 때, 시계열 선택부(130)는 상기 평균-표준편차, 상기 확률밀도함수 추정치 및 상기 경험 누적분포 함수를 이용하여 산출된 세 종류의 복수개의 시계열 데이터에 대한 세 개의 중앙값들을 산출할 수 있다.In this case, the time series selector 130 may calculate three median values for the three types of time series data calculated using the mean-standard deviation, the probability density function estimate, and the empirical cumulative distribution function.

이 때, 시계열 선택부(130)는 상기 세 개의 중앙값들 각각에 가장 근접한 세 개의 단일 시계열 데이터를 선택할 수 있다.In this case, the time series selector 130 may select three single time series data closest to each of the three median values.

이 때, 시계열 선택부(130)는 상기 세 개의 중앙값들에 상기 세 개의 단일 시계열 데이터가 근접한 정도를 비교하여, 가장 중앙값에 근접한 단일 시계열 데이터를 최종 단일 시계열 데이터로 선택할 수 있다.In this case, the time series selector 130 may compare the degree of proximity of the three single time series data to the three median values, and select single time series data closest to the median value as the final single time series data.

이 때, 시계열 선택부(130)는 평균-표준편차를 이용하여 산출된 단일 시계열(MS), 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터(DENS), 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF) 및 무작위로 선택된 단일 시계열 데이터(RAND)에 대해서, 주요 통계값들인 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)를 박스 플롯으로 산출할 수 있다.At this time, the time series selection unit 130 uses a single time series (MS) calculated using the mean-standard deviation, a single time series data (DENS) selected using a probability density function estimate, and a single time series selected using an empirical cumulative distribution function. For the data (ECDF) and randomly selected single time series data (RAND), the main statistical values are mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX) and minimum ( MIN) can be calculated as a box plot.

이 때, 시계열 선택부(130)는 박스 플롯의 최대값과 최소값이 중앙값에 가장 가까운 통계 기법을 선택하고, 6개의 주요 통계값들 중 가장 많이 선택된 통계기법의 단일 시계열 데이터를 선택할 수 있다.In this case, the time series selector 130 may select a statistical technique in which the maximum and minimum values of the box plot are closest to the median value, and may select single time series data of the statistical technique most selected from the six main statistical values.

또한, 시계열 선택부(130)는 평균-표준편차를 이용하여 산출된 단일 시계열 데이터(MS), 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터(DENS), 경험 누적분포함수(ECDF)를 이용하여 선택된 단일 시계열 데이터 및 무작위로 선택된 단일 시계열 데이터(RAND)에 대해서, 주요 통계값들인 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)를 박스 플롯으로 산출할 수 있다.In addition, the time series selector 130 uses the single time series data (MS) calculated using the mean-standard deviation, the single time series data (DENS) selected using the probability density function estimate, and the empirical cumulative distribution function (ECDF). For selected single time series data and randomly selected single time series data (RAND), the main statistical values, mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX) and minimum ( MIN) can be calculated as a box plot.

이 때, 시계열 선택부(130)는 박스 플롯의 최대값과 최소값이 중앙값에 가장 가까운 통계 기법을 선택하고, 6개의 주요 통계값들 중 가장 많이 선택된 통계기법의 단일 시계열 데이터를 최종 단일 시계열 데이터로 선택할 수 있다.At this time, the time series selector 130 selects a statistical technique in which the maximum and minimum values of the box plots are closest to the median value, and converts the single time series data of the statistical technique most selected from the six main statistical values into the final single time series data. You can choose.

이 때, 시계열 선택부(130)는 6개의 주요 통계값들 중 가장 많이 선택된 두 개의 통계기법의 단일 시계열 데이터를 선택할 수 있다.In this case, the time series selector 130 may select single time series data of two statistical techniques most selected from six main statistical values.

이 때, 시계열 선택부(130)는 가장 많이 선택된 두 개의 통계기법의 박스 플롯을 비교하여 어느 하나의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.At this time, the time series selection unit 130 may compare any box plot of the two most selected statistical techniques to select any single time series data as the final time series data.

이 때, 시계열 선택부(130)는 가장 많이 선택된 두 개의 통계기법의 박스 플롯의 주요 통계값들 중 최대값(MAXIMUM)과 이상치(OUTLIER)를 고려하여 어느 하나의 통계기법의 박스 플롯을 선택하고, 선택된 통계 기법의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.At this time, the time series selector 130 selects a box plot of any one of the statistical techniques in consideration of the maximum value MAXIMUM and OUTLIER among the main statistical values of the two most selected box plots of the statistical technique. In this case, the single time series data of the selected statistical technique can be selected as the final time series data.

또한, 시계열 선택부(130)는 가장 많이 선택된 두 개의 통계기법의 단일 시계열 데이터를 기존의 기후 모델 데이터 형식의 단일 시계열 데이터로 변환하여, 실제 관측값에 근접한 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In addition, the time series selector 130 may convert single time series data of the two most selected statistical techniques into single time series data of a conventional climate model data format, and select single time series data close to actual observations as final time series data. have.

도 2는 본 발명의 일실시예에 따른 시계열 데이터 처리 방법을 나타낸 동작 흐름도이다.2 is an operation flowchart illustrating a time series data processing method according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일실시예에 따른 시계열 데이터 처리 방법은 먼저 모델을 입력받을 수 있다(S210).Referring to FIG. 2, the time series data processing method according to an embodiment of the present invention may first receive a model (S210).

즉, 단계(S210)는 기후 모델 데이터를 입력 받을 수 있다.That is, step S210 may receive the climate model data.

또한, 본 발명의 일실시예에 따른 시계열 데이터 처리 방법은 시계열 데이터를 처리할 수 있다(S220).In addition, the time series data processing method according to an embodiment of the present invention may process time series data (S220).

즉, 단계(S220)는 상기 기후 모델 데이터를 평균-표준편차, 확률밀도함수 추정치 및 경험 누적분포 함수 중 적어도 하나를 이용하여 복수개의 시계열 데이터로 산출할 수 있다That is, step S220 may calculate the climate model data as a plurality of time series data using at least one of mean-standard deviation, probability density function estimate, and empirical cumulative distribution function.

또한, 본 발명의 일실시예에 따른 시계열 데이터 처리 방법은 시계열 데이터를 선택할 수 있다(S230).In addition, the time series data processing method according to an embodiment of the present invention may select time series data (S230).

즉, 단계(S230)는 상기 복수개의 시계열 데이터의 중앙값(median)을 산출하여 단일 시계열 데이터를 선택할 수 있다.That is, step S230 may select a single time series data by calculating a median of the plurality of time series data.

이 때, 단계(S230)는 상기 복수개의 시계열 데이터에서 상기 중앙값에 가장 근접한 시계열 데이터를 상기 단일 시계열 데이터로 선택할 수 있다.In this case, step S230 may select time series data closest to the median from the plurality of time series data as the single time series data.

이 때, 단계(S230)는 상기 평균-표준편차, 상기 확률밀도함수 추정치 및 상기 경험 누적분포 함수를 이용하여 산출된 세 종류의 복수개의 시계열 데이터에 대한 세 개의 중앙값들을 산출할 수 있다.In this case, step S230 may calculate three median values for the three types of time series data calculated using the mean-standard deviation, the probability density function estimate, and the empirical cumulative distribution function.

이 때, 단계(S230)는 상기 세 개의 중앙값들 마다 가장 근접한 세 개의 단일 시계열 데이터를 선택할 수 있다.In this case, step S230 may select three single time series data closest to the three median values.

이 때, 단계(S230)는 상기 세 개의 중앙값들에 상기 세 개의 단일 시계열 데이터가 근접한 정도를 비교하여, 가장 중앙값에 근접한 단일 시계열 데이터를 최종 단일 시계열 데이터로 선택할 수 있다.In this case, step S230 may compare the degree of proximity of the three single time series data to the three median values, and select single time series data closest to the median value as the final single time series data.

이 때, 단계(S230)는 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터 및 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터를 기존의 기후 모델 데이터 형식의 단일 시계열 데이터로 변환할 수 있다.In this case, step S230 may convert the single time series data selected using the probability density function estimate and the single time series data selected using the empirical cumulative distribution function into single time series data of the existing climate model data format.

이 때, 단계(S230)는 변환된 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터, 변환된 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터 및 평균-표준편차를 이용하여 선택된 단일 시계열 데이터 중 실제 관측값 시계열 데이터에 가장 근접한 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, step S230 is the actual observation of the single time series data selected using the transformed probability density function estimate, the single time series data selected using the transformed empirical cumulative distribution function, and the single time series data selected using the mean-standard deviation. The single time series data closest to the value time series data can be selected as the final time series data.

이 때, 단계(S230)는 변환된 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터, 변환된 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터 및 평균-표준편차를 이용하여 선택된 단일 시계열 데이터의 중앙값을 산출하고, 중앙값에 가장 근접한 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, step S230 may be performed using the single time series data selected using the transformed probability density function estimate, the single time series data selected using the transformed empirical cumulative distribution function, and the median value of the single time series data selected using the mean-standard deviation. The single time series data closest to the median can be selected as the final time series data.

또한, 단계(S230)는 평균-표준편차를 이용하여 선택된 단일 시계열 데이터(MS), 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터(DENS), 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF) 및 무작위로 선택된 단일 시계열 데이터(RAND)에 대해서, 주요 통계값들인 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)를 박스 플롯으로 생성할 수 있다.In addition, the step S230 may be performed using the single time series data MS selected using the mean-standard deviation, the single time series data DENS selected using the probability density function estimate, and the single time series data ECDF selected using the empirical cumulative distribution function. ), And for a single randomly selected time series data (RAND), the key statistics are mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX), and minimum (MIN). You can create a box plot.

이 때, 단계(S230)는 박스 플롯의 최대값과 최소값이 중앙값에 가장 가까운 통계 기법을 선택하고, 6개의 주요 통계값들 중 가장 많이 선택된 통계기법의 단일 시계열 데이터를 최종 단일 시계열 데이터로 선택할 수 있다.In this case, step S230 may select a statistical technique in which the maximum value and the minimum value of the box plot are closest to the median value, and select single time series data of the statistical technique most selected from the six main statistical values as the final single time series data. have.

이 때, 단계(S230)는 6개의 주요 통계값들 중 가장 많이 선택된 두 개의 통계기법의 단일 시계열 데이터를 선택할 수 있다.In this case, step S230 may select single time series data of the two statistical techniques most selected from the six main statistical values.

이 때, 단계(S230)는 가장 많이 선택된 두 개의 통계기법의 박스 플롯을 비교하여 어느 하나의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In this case, step S230 may compare the box plots of the two most selected statistical techniques to select any single time series data as the final time series data.

이 때, 단계(S230)는 가장 많이 선택된 두 개의 통계기법의 박스 플롯의 주요 통계값들 중 최대값(MAXIMUM)과 이상치(OUTLIER)를 고려하여 어느 하나의 통계기법의 박스 플롯을 선택하고, 선택된 통계 기법의 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.At this time, step S230 selects a box plot of any one of the statistical techniques in consideration of the maximum value MAXIMUM and OUTLIER among the main statistical values of the box plots of the two most selected statistical techniques. A single time series of statistical techniques can be selected as the final time series data.

또한, 단계(S230)는 가장 많이 선택된 두 개의 통계기법의 단일 시계열 데이터를 기존의 기후 모델 데이터 형식의 단일 시계열 데이터로 변환하여, 실제 관측값에 근접한 단일 시계열 데이터를 최종 시계열 데이터로 선택할 수 있다.In operation S230, the single time series data of the two most selected statistical techniques may be converted into single time series data of a conventional climate model data format, thereby selecting single time series data close to an actual observation value as final time series data.

도 3은 본 발명의 일실시예에 따른 평균-표준편차를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.3 is a graph showing time series data of a climate model using the mean-standard deviation according to an embodiment of the present invention.

도 3을 참조하면, 그래프는 기후 모델 데이터의 평균-표준편차를 이용하여 복수개의 시계열 데이터(ALL)를 나타낸 것을 알 수 있다.Referring to FIG. 3, it can be seen that the graph shows a plurality of time series data ALL using the mean-standard deviation of the climate model data.

이 때, 그래프는 복수개의 시계열 데이터에 대한 중앙값(MED)를 나타낸 것을 알 수 있다.In this case, it can be seen that the graph represents the median value MED for the plurality of time series data.

이 때, 그래프는 중앙값(MED)에 가장 근접한 단일 시계열 데이터(SELECTED)가 선택된 것을 알 수 있다.In this case, the graph shows that the single time series data SELECTED closest to the median MED is selected.

이 때, 그래프는 실제 관측값(OBS)을 나타낸 것을 알 수 있다.At this time, it can be seen that the graph shows the actual observation value OBS.

도 4는 본 발명의 일실시예에 따른 확률밀도함수 추정치를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.4 is a graph showing time series data of a climate model using a probability density function estimate according to an embodiment of the present invention.

도 4를 참조하면, 그래프는 기후 모델 데이터의 확률밀도함수 추정치를 이용하여 복수개의 시계열 데이터(ALL)를 나타낸 것을 알 수 있다.Referring to FIG. 4, it can be seen that the graph shows a plurality of time series data ALL using probability density function estimates of the climate model data.

이 때, 그래프는 복수개의 시계열 데이터에 대한 중앙값(MED)을 나타낸 것을 알 수 있다.In this case, it can be seen that the graph represents the median value MED for the plurality of time series data.

도 5는 본 발명의 일실시예에 따른 경험 누적분포함수를 이용하여 기후 모델의 시계열 데이터를 나타낸 그래프이다.5 is a graph showing time series data of a climate model using an empirical cumulative distribution function according to an embodiment of the present invention.

도 5를 참조하면, 그래프는 기후 모델 데이터의 경험 누적분포함수를 이용하여 복수개의 시계열 데이터(ALL)를 나타낸 것을 알 수 있다.Referring to FIG. 5, it can be seen that the graph shows a plurality of time series data ALL using the empirical cumulative distribution function of the climate model data.

도 6은 본 발명의 일실시예에 따른 3가지 통계 기법에 대한 기후 모델의 연최대치를 나타낸 그래프이다.Figure 6 is a graph showing the annual maximum of the climate model for three statistical techniques in accordance with an embodiment of the present invention.

도 6을 참조하면, 그래프는 관측값(OBS), 평균-표준편차를 이용하여 산출된 단일 시계열 데이터(MS), 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터(DENS), 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF)를 나타낸 것을 알 수 있다.Referring to FIG. 6, the graph shows an observation value (OBS), a single time series data (MS) calculated using the mean-standard deviation, a single time series data (DENS) selected using a probability density function estimate, and an empirical cumulative distribution function. It can be seen that the selected single time series data (ECDF) is represented.

도 7은 본 발명의 일실시예에 따른 3가지 통계 기법에 대한 주요 통계값들의 박스 플롯(Box Plot)을 나타낸 도면이다.FIG. 7 illustrates a box plot of key statistical values for three statistical techniques according to an embodiment of the present invention.

도 7을 참조하면, 평균-표준편차를 이용하여 산출된 단일 시계열 데이터(MS), 확률밀도함수 추정치를 이용하여 선택된 단일 시계열 데이터(DENS), 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF) 및 무작위로 선택된 단일 시계열 데이터(RAND)에 대한 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)를 박스 플롯으로 나타낸 것을 알 수 있다.Referring to FIG. 7, single time series data (MS) calculated using the mean-standard deviation, single time series data (DENS) selected using a probability density function estimate, and single time series data (ECDF) selected using an empirical cumulative distribution function Box plot of mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX), and minimum (MIN) for a single randomly selected time series data (RAND). Able to know.

도 8은 본 발명의 일실시예에 따른 평균-표준편차를 이용하여 산출된 단일 시계열(MS)과 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF)에 대한 주요 통계값들의 박스 플롯을 62개 지역에 대해 나타낸 그래프이다.FIG. 8 shows a box plot of key statistics for a single time series (MS) calculated using the mean-standard deviation and selected single time series data (ECDF) using an empirical cumulative distribution function, according to an embodiment of the present invention. This is a graph of the regions.

도 8을 참조하면, 평균-표준편차를 이용하여 산출된 단일 시계열 데이터(MS)과 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF)에 대한 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)의 박스 플롯을 62개 지역에 대해 나타낸 것을 알 수 있다.Referring to FIG. 8, the mean (MEAN), standard deviation (STD), why for a single time series data (MS) calculated using the mean-standard deviation and a single time series data (ECDF) selected using the empirical cumulative distribution function. It can be seen that the box plots of SKEW, kurtosis (KURT), MAX (MAX) and MIN (MIN) are shown for 62 regions.

도 9는 발명의 일실시예에 따른 평균-표준편차와 경험 누적분포함수에 대한 주요 통계값들의 박스 플롯을 나타낸 도면이다.FIG. 9 is a box plot of key statistical values for mean-standard deviation and empirical cumulative distribution function according to one embodiment of the invention. FIG.

도 9를 참조하면, 평균-표준편차를 이용하여 산출된 단일 시계열 데이터(MS)와 경험 누적분포함수를 이용하여 선택된 단일 시계열 데이터(ECDF)에 대한 평균(MEAN), 표준편차(STD), 왜도(SKEW), 첨도(KURT), 최대(MAX) 및 최소(MIN)를 박스 플롯으로 나타낸 것을 알 수 있다.Referring to FIG. 9, the mean (MEAN), standard deviation (STD), why for a single time series data (MS) calculated using the mean-standard deviation and a single time series data (ECDF) selected using the empirical cumulative distribution function. It can be seen that the degrees (SKEW), kurtosis (KURT), maximum (MAX) and minimum (MIN) are shown as box plots.

도 10은 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.10 illustrates a computer system according to an embodiment of the present invention.

도 10을 참조하면, 본 발명의 일실시예에 따른 시계열 데이터 처리 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1100)에서 구현될 수도 있다. 도 10에 도시된 바와 같이, 컴퓨터 시스템(1100)은 버스(1120)를 통하여 서로 통신하는 하나 이상의 프로세서(1110), 메모리(1130), 사용자 인터페이스 입력 장치(1140), 사용자 인터페이스 출력 장치(1150) 및 저장 장치(1160)를 포함할 수 있다. 또한, 컴퓨터 시스템(1100)은 네트워크(1180)에 연결되는 네트워크 인터페이스(1170)를 더 포함할 수 있다. 프로세서(1110)는 중앙 처리 장치 또는 메모리(1130)나 저장 장치(1160)에 저장된 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1130) 및 저장 장치(1160)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리는 ROM(1131)이나 RAM(1132)을 포함할 수 있다.Referring to FIG. 10, an apparatus for processing time series data according to an embodiment of the present invention may be implemented in a computer system 1100 such as a computer-readable recording medium. As shown in FIG. 10, computer system 1100 may include one or more processors 1110, memory 1130, user interface input device 1140, user interface output device 1150 that communicate with each other via a bus 1120. And storage device 1160. In addition, the computer system 1100 may further include a network interface 1170 connected to the network 1180. The processor 1110 may be a central processing unit or a semiconductor device that executes processing instructions stored in the memory 1130 or the storage device 1160. The memory 1130 and the storage device 1160 may be various types of volatile or nonvolatile storage media. For example, the memory may include a ROM 1131 or a RAM 1132.

이상에서와 같이 본 발명에 따른 시계열 데이터 처리 장치 및 방법은 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, the apparatus and method for time series data processing according to the present invention is not limited to the configuration and method of the embodiments described as described above, but the embodiments may be modified in various ways so that various modifications can be made. All or part may be selectively combined.

110: 모델 입력부
120: 시계열 처리부
130: 시계열 선택부
1100: 컴퓨터 시스템 1110: 프로세서
1120: 버스 1130: 메모리
1131: 롬 1132: 램
1140: 사용자 인터페이스 입력 장치
1150: 사용자 인터페이스 출력 장치
1160: 저장 장치 1170: 네트워크 인터페이스
1180: 네트워크110: model input unit
120: time series processing unit
130: time series selector
1100: computer system 1110: processor
1120: bus 1130: memory
1131: Romans 1132: Ram
1140: user interface input device
1150: user interface output device
1160: storage device 1170: network interface
1180: network

Claims

A model input unit for receiving climate model data;
A time series processor for calculating the climate model data as a plurality of time series data using at least one of an average-standard deviation, a probability density function estimate, and an empirical cumulative distribution function; And
A time series selector configured to select a single time series data by calculating a median of the plurality of time series data;
Including,
The time series selector
The median values of the mean-standard deviation, which are three median values for three types of time series data calculated using the mean-standard deviation, the probability density function estimate, and the empirical cumulative distribution function, and the probability density function estimate. Calculate medians and median values of the empirical cumulative distribution function,
Selecting three single time series data closest to each of the three median values
Select first single time series data of mean-standard deviation closest to the median of the mean-standard deviation,
Select second single time series data of the probability density function estimate closest to the median of the probability density function estimate,
Select third single time series data of the empirical cumulative distribution function closest to the median of the empirical cumulative distribution function,
The degree to which the median of the mean-standard deviation and the first single time series data are close,
The degree to which the median values of the probability density function estimates and the second single time series data are close and
To compare the proximity of the median of the empirical cumulative distribution function and the third single time series data,
For the first to third single time series data, at least one key statistical value among mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX), and minimum (MIN) Generate box plots based on
Selecting a box plot in which the maximum and minimum values of the box plots are closest to the median, and converting any single time series data from which the most box plots among the box plots of the first to third single time series data are selected. The time series data processing apparatus, characterized in that the selection.

delete

The method according to claim 1,
The time series selector
Converting single time series data calculated using a probability density function estimate and an empirical cumulative distribution function into a single time series data format of the climate model data, calculating a median value with the single time series data calculated using the mean-standard deviation, And selecting single time series data closest to the median as final time series data.

delete

The method according to claim 1,
The time series selector
Select two single time series data with the most box plots selected, and select any single time series data as the final time series data, taking into account the maximum (IMIMUM) and the OUTLIER of the box plots of the two single time series data. Time-series data processing apparatus, characterized in that.

In the time series data processing method of the time series data processing apparatus,
Receiving climate model data;
Calculating the climate model data as a plurality of time series data using at least one of a mean-standard deviation, a probability density function estimate, and an empirical cumulative distribution function; And
Selecting a single time series data by calculating a median of the plurality of time series data;
Including,
Selecting the single time series data
The median values of the mean-standard deviation, which are three median values for three types of time series data calculated using the mean-standard deviation, the probability density function estimate, and the empirical cumulative distribution function, and the probability density function estimate. Calculate median values and median values of the empirical cumulative distribution function,
Selecting three single time series data closest to each of the three median values
Select first single time series data of mean-standard deviation closest to the median of the mean-standard deviation,
Select second single time series data of the probability density function estimate closest to the median of the probability density function estimate,
Select third single time series data of the empirical cumulative distribution function closest to the median of the empirical cumulative distribution function,
The degree to which the median of the mean-standard deviation and the first single time series data are close,
The degree to which the median values of the probability density function estimate and the second single time series data are close and
To compare the closeness of the median of the empirical cumulative distribution function and the third single time series data,
For the first to third single time series data, at least one key statistical value among mean (MEAN), standard deviation (STD), skewness (SKEW), kurtosis (KURT), maximum (MAX), and minimum (MIN) Create box plots based on
Selecting a box plot in which the maximum and minimum values of the box plots are closest to the median, and converting any single time series data from which the most box plots among the box plots of the first to third single time series data are selected. Time series data processing method characterized in that the selection.