KR20210035622A

KR20210035622A - Time series data similarity calculation system and method

Info

Publication number: KR20210035622A
Application number: KR1020190117644A
Authority: KR
Inventors: 정인영; 송인성; 김원일; 신민철
Original assignee: 주식회사 디셈버앤컴퍼니자산운용
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-04-01
Also published as: KR102536201B1

Abstract

The present invention calculates a similarity based on the characteristics of time series data. More specifically, the present invention relates to a system and method for calculating a similarity of time series data capable of calculating the similarity of the time series data. The system comprises: a time series data receiving part that receives first time series data for a first item and second time series data for a second item; a time series data processing part that processes the received first time series data and the second time series data to generate first time series analysis data and second time series analysis data; a section division part that divides the first time series analysis data and the second time series analysis data into a plurality of sections respectively, to generate a plurality of pieces of first time series analysis sub data and a plurality of pieces of second time series analysis sub data; a first generator that uses a plurality of pieces of first time series analysis sub data as input data and generates a plurality of first transformed data matching attributes of the plurality of pieces of second time series analysis sub data; a second generator that uses the generated plurality of pieces of first transformed data as input data to generate a plurality of pieces of first reconstruction data matching the properties of the plurality of pieces of the first time series analysis sub data; a discriminator that discriminates the first transformed data by using the plurality of pieces of first transformed data and the plurality of pieces of second time series analysis sub data generated by the first generator as input values; and a similarity calculation part that seeks out a generator loss value, which is the loss function value of the first generator, a discriminator's loss function value, which is the discriminator's loss value, and a cycle loss value, which is a loss function value between the plurality of pieces of first time series analysis sub data and a plurality of pieces of first reconstruction data, and calculates a degree of similarity between the first time series data and the second time series data based on the generator loss value, discriminator's loss function value, and cycle loss value.

Description

Time series data similarity calculation system and method

본 발명은 시계열 데이터 유사도 계산 시스템 및 방법에 관한 것으로, 보다 상세하게는 시계열 데이터의 특성을 기초로 유사도를 계산함으로써 보다 정확하게 시계열 데이터의 유사도를 계산할 수 있도록 하는 시계열 데이터의 유사도 계산 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for calculating the similarity of time series data, and more particularly, to a system and method for calculating the similarity of time series data to more accurately calculate the similarity of time series data by calculating the similarity based on the characteristics of the time series data. will be.

시계열 데이터는 일정 시간 간격으로 배치된 데이터들의 수열을 말하며, 시계열 분석은 주어진 시계열을 보고 수학적인 모델을 만들어서 이를 기반으로 시계열 데이터를 분석한 방법을 말하며, 최근에는 시계열 분석으로 공학이나 과학계산, 혹은 주가 예측 등으로 많이 쓰이고 있는 추세이다.Time series data refers to a sequence of data arranged at regular time intervals, and time series analysis refers to a method of analyzing time series data based on a mathematical model by looking at a given time series. It is a trend that is widely used for stock price prediction.

종래기술인 한국등록특허 제10-1908786호, "데이터 유사도 평가 시스템"은 복수 개의 시계열 데이터, 2차원 및 3차원 공간 분포 데이터 등 데이터의 유사도를 평가하는 기술을 개시하고 있는데, 복수의 시계열 데이터 중 기준값에 따라 변화하는 값 또는 비율로 유사도를 평가하는데 그치고 있어, 주가 데이터의 특성상 다수의 투자자들의 심리로 인하여 발생된 빠르게 진동하는 랜덤 성분이 반영된 주가 시계열 데이터의 유사도를 평가하는데 어려움이 있다. 따라서 랜덤 성분이 반영된 주가 시계열 데이터를 이용하여 유사도를 계산할 수 있도록 하는 시스템 및 방법이 요구된다.The prior art Korean Patent Registration No. 10-1908786, "Data Similarity Evaluation System" discloses a technology for evaluating the similarity of data such as a plurality of time series data, two-dimensional and three-dimensional spatial distribution data, and a reference value among a plurality of time series data. It is difficult to evaluate the similarity of stock price time series data reflecting a rapidly oscillating random component caused by the sentiment of a large number of investors due to the nature of stock price data, since it is only assessing the similarity with a value or ratio that changes according to the value or ratio. Therefore, there is a need for a system and a method that enables calculation of similarity by using stock price time series data reflecting random components.

한국등록특허 제10-1908786호Korean Patent Registration No. 10-1908786

본 발명은 시계열 데이터의 특성을 기초로 유사도를 계산함으로써 보다 정확하게 시계열 데이터의 유사도를 계산할 수 있도록 하는 것을 목적으로 한다.An object of the present invention is to more accurately calculate the similarity of the time series data by calculating the similarity based on the characteristics of the time series data.

본 발명은 각각의 제1 및 제2 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 일별 누적 수익률 값으로 시계열 분석 데이터를 생성함으로써 데이터를 가공하여 시계열 데이터의 추세를 확인할 수 있도록 하는 것을 목적으로 한다.An object of the present invention is to process the data by generating time series analysis data with a daily cumulative return value calculated based on the first daily price information of each of the first and second time series data so that the trend of the time series data can be confirmed. .

본 발명은 시계열 분석 데이터의 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 시계열 분석 서브 데이터를 생성함으로써 보다 정확하게 시계열 데이터의 유사도를 계산할 수 있도록 하는 것을 목적으로 한다.In the present invention, a date that is increased by a first reference value of time series analysis data is determined as a start date of each section, and time series analysis sub-data is generated with a length by a second reference value from each start date to more accurately measure the similarity of time series data. It aims to be able to calculate.

이러한 목적을 달성하기 위하여 본 발명의 일실시예에 따른 시계열 데이터 유사도 계산 시스템은 제1 종목에 대한 제1 시계열 데이터 및 제2 종목에 대한 제2 시계열 데이터를 수신하는 시계열 데이터 수신부, 상기 수신한 제1 시계열 데이터 및 제2 시계열 데이터를 가공하여 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 생성하는 시계열 데이터 가공부, 상기 제1 시계열 분석 데이터 및 상기 제2 시계열 분석 데이터를 각각 복수의 구간으로 분할하여 복수의 제1 시계열 분석 서브 데이터와 복수의 제2 시계열 분석 서브 데이터를 생성하는 구간 분할부, 상기 복수의 제1 시계열 분석 서브 데이터를 입력 데이터로 하여 상기 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 변환 데이터를 생성하는 제1 생성자, 상기 생성된 복수의 제1 변환 데이터를 입력 데이터로 하여 상기 복수의 제1 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 재구성 데이터를 생성하는 제2 생성자, 상기 제1 생성자에서 생성된 복수의 제1 변환 데이터 및 상기 복수의 제2 시계열 분석 서브 데이터를 입력 값으로 하여 제1 변환 데이터를 구별하는 구별자 및 상기 제1 생성자의 손실 함수(Loss Function) 값인 생성자 손실 값(Loss), 상기 구별자의 손실 함수(Loss Function) 값인 구별자 손실 값(Loss) 및 상기 복수의 제1 시계열 분석 서브 데이터와 상기 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값인 사이클 손실 값(Loss)을 구하고, 이를 기초로 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산하는 유사도 계산부를 포함하여 구성될 수 있다.To achieve this object, a time series data similarity calculation system according to an embodiment of the present invention includes a time series data receiving unit for receiving first time series data for a first item and second time series data for a second item, and the received second time series data. 1 A time series data processing unit that processes time series data and second time series data to generate first time series analysis data and second time series analysis data, and divides the first time series analysis data and the second time series analysis data into a plurality of sections, respectively. A section dividing unit that generates a plurality of first time series analysis sub-data and a plurality of second time series analysis sub-data as input data, and attributes of the plurality of second time series analysis sub-data A first generator that generates a plurality of first transformed data according to the method, and generates a plurality of first reconstructed data matching the attributes of the plurality of first time series analysis sub-data by using the generated plurality of first transformed data as input data. A second generator that distinguishes the first transformed data by using a plurality of first transformed data generated by the first generator and the plurality of second time series analysis sub-data as input values, and a loss function of the first generator Loss between the generator loss value (Loss), which is a (Loss Function) value, the discriminator loss value (Loss), which is the loss function value of the discriminator, and the plurality of first time series analysis sub-data and the plurality of first reconstructed data It may be configured to include a similarity calculator that calculates a similarity between the first time series data and the second time series data based on the cycle loss value Loss, which is a Loss Function value.

또한, 상기 제1 시계열 데이터 및 상기 제2 시계열 데이터는 각각 상기 제1 종목의 일별 가격정보 및 상기 제2 종목의 일별 가격정보를 포함하고, 상기 시계열 데이터 가공부는 상기 제1 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 상기 제1 종목의 일별 누적 수익률 값으로 상기 제1 시계열 분석 데이터를 생성하고, 상기 제2 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 상기 제1 종목의 일별 누적 수익률 값으로 상기 제2 시계열 분석 데이터를 생성하는 것을 특징으로 하여 구성될 수 있다.In addition, the first time series data and the second time series data each include daily price information of the first stock and daily price information of the second stock, and the time series data processing unit The first time series analysis data is generated using the daily cumulative return value of the first stock calculated based on information, and the daily cumulative return value of the first stock calculated based on the first daily price information of the second time series data It may be configured by generating the second time series analysis data.

또한, 상기 구간 분할부는 상기 제1 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제1 시계열 분석 서브 데이터를 생성하고, 상기 제2 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제2 시계열 분석 서브 데이터를 생성하는 것을 특징으로 하여 구성될 수 있다.In addition, the section dividing unit determines a date increased by a first reference value from the first day of the first time series analysis data as a start date of each section, and the first time series analysis sub-data with a length by a second reference value from each start date. And, a date that increases by a first reference value from the first day of the second time series analysis data is determined as a start date of each section, and a second time series analysis sub-data is generated with a length of the second reference value from each start date. It can be configured as characterized in that.

또한, 상기 제1 기준값은 상기 제2 기준값보다 작은 것을 특징으로 하여 구성될 수 있다.In addition, the first reference value may be configured to be smaller than the second reference value.

또한, 상기 구간 분할부는 상기 제1 시계열 분석 서브 데이터 및 상기 제2 시계열 분석 서브 데이터를 -1.0 이상, 1.0 이하의 정규화된 값으로 생성하는 것을 특징으로 하여 구성될 수 있다.In addition, the section division unit may be configured to generate the first time series analysis sub-data and the second time series analysis sub-data with a normalized value of -1.0 or more and 1.0 or less.

또한, 상기 유사도 계산부는 상기 제1 시계열 데이터 및 상기 제2 시계열 데이터를 반대로 입력 받은 경우의 생성자 손실 값, 구별자 손실 값 및 사이클 손실 값을 각각 더 구하고, 구해진 6개의 손실 값을 이용하여 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산하는 것을 특징으로 하여 구성될 수 있다.In addition, the similarity calculation unit further obtains a generator loss value, a discriminator loss value, and a cycle loss value when the first time series data and the second time series data are input in reverse, and uses the obtained six loss values to obtain a first It may be configured by calculating a similarity between the time series data and the second time series data.

본 발명은 시계열 데이터의 특성을 기초로 유사도를 계산함으로써 보다 정확하게 시계열 데이터의 유사도를 계산할 수 있도록 한다.The present invention makes it possible to more accurately calculate the similarity of the time series data by calculating the similarity based on the characteristics of the time series data.

본 발명은 각각의 제1 및 제2 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 일별 누적 수익률 값으로 시계열 분석 데이터를 생성함으로써 데이터를 가공하여 시계열 데이터의 추세를 확인할 수 있도록 한다.The present invention generates time series analysis data with a daily cumulative return value calculated based on the first daily price information of each of the first and second time series data, thereby processing the data so that the trend of the time series data can be confirmed.

본 발명은 시계열 분석 데이터의 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 시계열 분석 서브 데이터를 생성함으로써 보다 정확하게 시계열 데이터의 유사도를 계산할 수 있도록 한다.In the present invention, a date that is increased by a first reference value of time series analysis data is determined as a start date of each section, and time series analysis sub-data is generated with a length by a second reference value from each start date to more accurately measure the similarity of time series data. Make it possible to calculate.

도 1은 본 발명의 일실시예에 따른 Cycle GAN 기반 이미지를 구분하는 일례를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 시계열 데이터의 유사도 계산 시스템의 구성도를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 시계열 데이터를 기반으로 제1 변환 이미지와 제2 시계열 분석 서브 데이터로 구분하는 일례를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 제1 기준값 및 제2 기준값을 기초로 제2 시계열 분석 서브 데이터를 생성하는 일례를 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 시계열 데이터의 유사도 계산 방법의 순서도를 도시한 도면이다.1 is a diagram illustrating an example of classifying a cycle GAN-based image according to an embodiment of the present invention.
2 is a diagram showing a configuration diagram of a system for calculating similarity of time series data according to an embodiment of the present invention.
3 is a diagram illustrating an example of dividing a first transformed image into a second time series analysis sub-data based on time series data according to an embodiment of the present invention.
4 is a diagram illustrating an example of generating second time series analysis sub-data based on a first reference value and a second reference value according to an embodiment of the present invention.
5 is a flowchart illustrating a method of calculating the similarity of time series data according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략하기로 한다. 또한 본 발명의 실시예들을 설명함에 있어 구체적인 수치는 실시예에 불과하며 이에 의하여 발명의 범위가 제한되지 아니한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, when it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present invention, a detailed description thereof will be omitted. Further, in describing the embodiments of the present invention, specific numerical values are only examples, and the scope of the invention is not limited thereby.

본 발명에 따른 시계열 데이터의 유사도 값을 계산하는 시스템은 중앙처리장치(CPU) 및 메모리(Memory, 기억장치)를 구비하고 인터넷 등의 통신망을 통하여 다른 단말기와 연결 가능한 서버의 형태로 구성될 수 있다. 그러나 본 발명이 중앙처리장치 및 메모리 등의 구성에 의해 한정되지는 아니한다. 또한, 본 발명에 따른 시계열 데이터의 유사도 값을 계산하는 시스템은 물리적으로 하나의 장치로 구성될 수도 있으며, 복수의 장치에 분산된 형태로 구현될 수도 있어, 본 발명은 이와 같은 물리적인 장치의 구성에 의하여 한정되지 아니한다.The system for calculating the similarity value of time series data according to the present invention may be configured in the form of a server that has a central processing unit (CPU) and a memory (memory device) and can be connected to other terminals through a communication network such as the Internet. . However, the present invention is not limited by the configuration of the central processing unit and the memory. In addition, the system for calculating the similarity value of time series data according to the present invention may be physically configured as a single device, or may be implemented in a form distributed over a plurality of devices. Is not limited by.

도 1은 본 발명의 일실시예에 따른 Cycle GAN 기반 이미지를 구분하는 일례를 도시한 도면이다.1 is a diagram illustrating an example of classifying a cycle GAN-based image according to an embodiment of the present invention.

GAN은 어떤 분포의 데이터든 학습을 통해 모방할 수 있도록 하는 것으로, 사실상 모든 분야의 데이터를 창조할 수 있으며, 생성자와 구별자(260)라는 두가지 모델을 만들어 적대적인 학습(Adversarial Training)을 진행할 수 있다.GAN allows you to imitate data of any distribution through learning, and can create virtually all fields of data, and can proceed with adversarial training by creating two models: constructor and distinguisher (260). .

예를 들면, 얼룩말 이미지를 변환하기 위하여 말 이미지의 특성을 반영하여 생성자를 생성하고, 생성자를 기반으로 변환 이미지를 생성할 수 있다. 구별자(260)는 변환 이미지와 말 이미지를 입력 받아 입력 받은 이미지가 변환 이미지인지 말 이미지인지 구별할 수 있도록 한다.For example, in order to convert a zebra image, a constructor may be created by reflecting characteristics of a horse image, and a converted image may be generated based on the constructor. The discriminator 260 receives the converted image and the horse image, and makes it possible to distinguish whether the received image is a converted image or a horse image.

도 2는 본 발명의 일실시예에 따른 시계열 데이터의 유사도 계산 시스템의 구성도를 도시한 도면이다.2 is a diagram showing the configuration of a system for calculating similarity of time series data according to an embodiment of the present invention.

본 발명에 따른 시계열 데이터의 유사도 값을 계산하는 시스템은 시계열 데이터 수신부(210), 시계열 데이터 가공부(220), 구간 분할부(230), 제1 생성자(240), 제2 생성자(250), 구별자(260), 유사도 계산부(270) 로 구성될 수 있으며, 각각의 구성요소들은 물리적으로 동일한 컴퓨터 시스템 내에서 동작하는 소프트웨어 모듈일 수 있으며, 물리적으로 2개 이상으로 분리된 컴퓨터 시스템이 서로 연동하여 동작할 수 있도록 구성된 형태일 수 있는데, 동일한 기능을 포함하는 다양한 실시형태가 본 발명의 권리범위에 속한다.The system for calculating the similarity value of time series data according to the present invention includes a time series data receiving unit 210, a time series data processing unit 220, a section division unit 230, a first generator 240, a second generator 250, It may be composed of a discriminator 260 and a similarity calculation unit 270, and each of the components may be a software module that physically operates within the same computer system, and two or more computer systems physically separated from each other It may be configured to operate in conjunction, and various embodiments including the same function belong to the scope of the present invention.

시계열 데이터 수신부(210)는 제1 종목에 대한 제1 시계열 데이터 및 제2 종목에 대한 제2 시계열 데이터를 수신한다.The time series data receiving unit 210 receives first time series data for a first item and second time series data for a second item.

제1 종목 및 제2 종목은 특정 증권 또는 금융상품일 수 있으며, 상장지수펀드(ETF)일 수 있다. 제1 종목 및 제2 종목이 상장지수펀드(ETF)일 경우 제1 종목 및 제2 종목이 추종하는 기초자산이 동일할 수 있으며, 각각의 종목이 추종하는 기초자산이 동일할 경우 유사종목으로 특정할 수 있다.The first and second stocks may be specific securities or financial products, and may be ETFs. If the 1st and 2nd stocks are ETFs, the underlying assets followed by the 1st and 2nd stocks may be the same, and if the underlying assets followed by each of the stocks are the same, they are designated as similar stocks. can do.

제1 시계열 데이터는 제1 종목과 관련된 일정 기간의 가격정보를 일정 시간 간격으로 배치된 데이터들의 수열일 수 있으며, 제2 시계열 데이터는 제2 종목과 관련된 일정 기간의 가격정보를 일정 시간 간격으로 배치된 데이터들의 수열일 수 있다. 시계열 데이터 수신부(210)는 제1 종목에 대한 제1 시계열 데이터 및 제2 종목에 대한 제2 시계열 데이터를 수신할 수 있다.The first time series data may be a sequence of data in which price information for a certain period related to the first item is arranged at regular time intervals, and the second time series data is a sequence of price information for a certain period related to the second item at regular time intervals. It may be a sequence of generated data. The time series data receiver 210 may receive first time series data for a first item and second time series data for a second item.

제1 시계열 데이터 및 제2 시계열 데이터는 각각 제1 종목의 일별 가격정보 및 제2 종목의 일별 가격정보를 포함한다.The first time series data and the second time series data each include daily price information of the first stock and daily price information of the second stock.

제1 시계열 데이터는 제1 종목의 일별 가격정보를 포함할 수 있으며, 일별 가격정보를 기초로 시간의 함수로 표현되는 데이터 일 수 있다. 제2 시계열 데이터는 제2 종목의 일별 가격정보를 포함할 수 있으며, 일별 가격정보를 기초로 시간의 함수로 표현되는 데이터일 수 있다.The first time series data may include daily price information of the first item, and may be data expressed as a function of time based on the daily price information. The second time series data may include daily price information of the second item, and may be data expressed as a function of time based on the daily price information.

시계열 데이터 가공부(220)는 수신한 제1 시계열 데이터 및 제2 시계열 데이터를 가공하여 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 생성한다.The time series data processing unit 220 generates first time series analysis data and second time series analysis data by processing the received first time series data and second time series data.

시계열 분석 데이터는 시계열 데이터가 가공된 데이터일 수 있다. 시계열 데이터 가공부(220)는 제1 종목에 대한 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 분석할 수 있는 데이터의 형태로 가공할 수 있다. 제1 시계열 분석 데이터는 제1 시계열 데이터를 가공한 데이터 일 수 있으며, 제2 시계열 분석 데이터는 제2 시계열 데이터를 가공한 데이터 일 수 있다.The time series analysis data may be data obtained by processing time series data. The time series data processing unit 220 may process the first time series analysis data and the second time series analysis data for the first item into a form of data that can be analyzed. The first time series analysis data may be data obtained by processing the first time series data, and the second time series analysis data may be data obtained by processing the second time series data.

시계열 데이터 가공부(220)는 제1 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 제1 종목의 일별 누적 수익률 값으로 제1 시계열 분석 데이터를 생성하고, 제2 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 제1 종목의 일별 누적 수익률 값으로 제2 시계열 분석 데이터를 생성한다.The time series data processing unit 220 generates first time series analysis data using the cumulative daily return value of the first item calculated based on the first daily price information of the first time series data, and calculates the first daily price information of the second time series data. The second time series analysis data is generated using the daily cumulative return value of the first stock calculated as the basis.

제1 시계열 데이터의 첫번째 일별 가격정보는 제1 시계열 데이터가 시작되는 첫번째 날의 가격정보 일 수 있다. 시계열 데이터 가공부(220)는 제1 시계열 데이터의 1일째 되는 날인 첫번째 일별 가격정보를 기준으로 두 번째 일자부터 마지막 일자까지 일별 누적 수익률을 계산하고 그 값으로 제1 시계열 분석 데이터를 생성할 수 있다. 제2 시계열 데이터의 첫번째 일별 가격정보는 제2 시계열 데이터가 시작되는 첫번째 날의 가격정보 일 수 있다. 시계열 데이터 가공부(220)는 제2 시계열 데이터를 기준으로 첫번째 날의 가격정보에 대한 일별 누적 수익률 값은 0일 수 있으며, 두 번째 날부터 마지막 날까지의 일별 누적 수익률 값으로 제2 시계열 분석 데이터를 생성할 수 있다. 시계열 데이터 가공부(220)는 제1 종목 및 제2 종목의 일별 가격정보를 기초로 특정 시작시점이 되는 첫번째 일별 가격정보로 누적수익률을 계산할 수 있다. 누적수익률을 계산하는 수식은

일 수 있다. (여기서 t는 날짜수를 의미하며, T는 특정 기간 동안의 총 영엽일 수이다.)The first daily price information of the first time series data may be price information of the first day when the first time series data starts. The time series data processing unit 220 may calculate a daily cumulative rate of return from the second date to the last date based on the price information for the first day, which is the first day of the first time series data, and generate first time series analysis data with the value. . The first daily price information of the second time series data may be price information of the first day when the second time series data starts. The time series data processing unit 220 may have a daily cumulative return value for price information of the first day based on the second time series data, and the second time series analysis data as a daily cumulative return value from the second day to the last day. Can be created. The time series data processing unit 220 may calculate a cumulative rate of return based on the daily price information of the first stock and the second stock based on the first daily price information at a specific starting point. The formula for calculating the cumulative rate of return is

Can be (Where t means the number of days, and T is the total number of leaves during a specific period.)

예를 들면, 제1 시계열 데이터의 첫번째 일별 가격정보는 100원이며, 두번째 일별 가격정보는 90원, 세번째 일별 가격정보는 110원, 네번째 일별 가격정보는 120원, 다섯 번째 일별 가격정보는 130원일 수 있다. 시계열 데이터 가공부(220)는 첫 번째 일자 0을 시작으로 두 번째 일자의 누적 수익률 값을 -10%, 세번째 일자의 누적 수익률 값을 10%, 네번째 일자의 누적 수익률 값을 20%, 다섯 번째 일자의 누적 수익률 값을 30%로 배치하여 제1 시계열 분석 데이터로 생성할 수 있다.For example, the first daily price information of the first time series data is 100 won, the second daily price information is 90 won, the third daily price information is 110 won, the fourth daily price information is 120 won, and the fifth daily price information is 130 won. I can. The time series data processing unit 220 sets the cumulative return value of the second day to -10% starting from the first day 0, the cumulative return value of the third day by 10%, the cumulative return value of the fourth day by 20%, and the fifth date. By arranging the cumulative return value of 30%, it can be generated as the first time series analysis data.

구간 분할부(230)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 각각 복수의 구간으로 분할하여 복수의 제1 시계열 분석 서브 데이터와 복수의 제2 시계열 분석 서브 데이터를 생성한다.The section dividing unit 230 divides the first time series analysis data and the second time series analysis data into a plurality of sections, respectively, to generate a plurality of first time series analysis sub-data and a plurality of second time series analysis sub-data.

구간 분할부(230)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 복수의 구간으로 분할할 수 있으며, 분할한 제1 시계열 분석 데이터는 제1 시계열 분석 서브 데이터로 생성할 수 있으며, 분할할 제2 시계열 분석 데이터는 제2 시계열 분석 서브 데이터로 생성할 수 있다. 구간 분할부(230)는 정해진 기준값을 기준으로 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 각각 분할할 수 있으며, 분할한 제1 시계열 분석 데이터는 제1 시계열 분석 서브 데이터로 생성할 수 있으며, 분할할 제2 시계열 분석 데이터는 제2 시계열 분석 서브 데이터로 생성할 수 있다.The section dividing unit 230 may divide the first time series analysis data and the second time series analysis data into a plurality of sections, and the divided first time series analysis data may be generated as first time series analysis sub-data. The second time series analysis data may be generated as second time series analysis sub-data. The section dividing unit 230 may divide each of the first time series analysis data and the second time series analysis data based on a predetermined reference value, and the divided first time series analysis data may be generated as first time series analysis sub-data, The second time series analysis data to be divided may be generated as second time series analysis sub-data.

구간 분할부(230)는 제1 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제1 시계열 분석 서브 데이터를 생성하고, 제2 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제2 시계열 분석 서브 데이터를 생성한다.The section dividing unit 230 determines a date increased by the first reference value from the first day of the first time series analysis data as the start date of each section, and the first time series analysis sub-data with a length by the second reference value from each start date. Is generated, and the date increased by the first reference value from the first day of the second time series analysis data is set as the start date of each section, and the second time series analysis sub-data is generated with a length by the second reference value from each start date. .

제1 기준값 및 제2 기준값은 시계열 분석 데이터를 구분할 수 있는 일수의 기준 값일 수 있다. 구간 분할부(230)는 시계열 분석 데이터의 첫날부터 마지막날까지 제1 기준값 만큼 증가한 일수씩 구분할 수 있다. 구간 분할부(230)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 제1 기준값으로 구분한 구간의 각각의 시작일을 제2 기준값의 시작일로 정할 수 있으며, 제1 기준값으로 구분한 구간의 각각의 시작일을 기준으로 제2 기준값 만큼씩 증가한 날짜로 각각 구분할 수 있다. 구간 분할부(230)는 제2 기준값으로 구분한 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터는 각각 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터로 생성할 수 있다.The first reference value and the second reference value may be reference values of a number of days by which time series analysis data can be distinguished. The section dividing unit 230 may divide the number of days increased by the first reference value from the first day to the last day of the time series analysis data. The section dividing unit 230 may determine the start date of each of the sections divided by the first time series analysis data and the second time series analysis data by the first reference value as the start date of the second reference value, and each of the sections divided by the first reference value Each can be classified by a date that increases by the second reference value based on the start date of. The section dividing unit 230 may generate the first time series analysis data and the second time series analysis data divided by the second reference value as first time series analysis sub-data and second time series analysis sub-data, respectively.

예를 들면, 제1 기준값은 5일, 제2 기준값은 40일이라고 한다면, 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 전체 일수는 60일이다. 구간 분할부(230)는 제1 기준값을 기준으로 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 시작일인 첫날을 기준으로 60일을 각각 5일씩 증가하면서 구분할 수 있으며, 첫날인 1일부터 5일까지, 6일부터 10일까지, 11일부터 15일까지, 16일부터 20일까지, 21일부터 25일까지, 26일부터 30일까지 31일부터 35일까지, 36일부터 40일까지, 41일부터 45일까지, 46일부터 50일까지, 51일부터 55일까지, 56일부터 60일까지로 구분할 수 있다. 또한, 구간 분할부(230)는 제1 기준값으로 구분한 데이터를 기준으로 제2 기준값 만큼씩 길이로 서브 데이터를 생성하면, 1일부터 40일까지, 6일부터 45일까지, 11일부터 50일까지, 16일부터 55일까지, 21일부터 60일까지로 구분하여 제1 시계열 분석 서브 데이터와 제2 시계열 분석 서브 데이터를 생성할 수 있다.For example, if the first reference value is 5 days and the second reference value is 40 days, the total number of days of the first time series analysis data and the second time series analysis data is 60 days. The section dividing unit 230 may divide the first time series analysis data and the second time series analysis data from the first day, which is the start date of the second time series analysis data, to increase by 5 days, respectively, based on the first reference value, and divide the first time series analysis data into 5 days. Until, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, 31 to 35, 36 to 40, It can be divided into 41 to 45 days, 46 to 50 days, 51 to 55 days, and 56 to 60 days. In addition, when the section dividing unit 230 generates sub-data with a length of the second reference value based on the data divided by the first reference value, the 1st to 40th, 6th to 45th, 11th to 50th The first time series analysis sub-data and the second time series analysis sub-data may be generated by dividing into days, from the 16th to the 55th, and from the 21st to the 60th.

제1 기준값은 제2 기준값 보다 작다.The first reference value is smaller than the second reference value.

제1 기준값을 기준으로 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 구분하고, 제1 기준값을 기준으로 구분한 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터의 각각의 첫째날을 기준으로 제2 기준값으로 다시 구분하기 때문에 제2 기준값은 제1 기준값 보다 클 수 있다. 제1 기준값으로 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 구분하고, 그 값의 첫날을 기준으로 제2 기준값 만큼 구분할 수 있는데 제1 기준값이 제2 기준값 보다 큰 값이 된다면, 제1 기준값 사이에 제2 기준값이 포함되기 때문에 제1 시계열 분석 서브 데이터와 제2 시계열 분석 서브 데이터 내에 속하지 않는 일자 또는 구간이 발생할 수 있다.The first time series analysis data and the second time series analysis data are divided based on the first reference value, and the second reference value is based on the first day of each of the first time series analysis data and the second time series analysis data divided based on the first reference value. The second reference value may be larger than the first reference value because it is divided again by. The first time series analysis data and the second time series analysis data can be classified as the first reference value, and the value can be divided by the second reference value based on the first day of the value. If the first reference value becomes a value greater than the second reference value, between the first reference values Since the second reference value is included, a date or period that does not belong to the first time series analysis sub-data and the second time series analysis sub-data may occur.

예를 들면, 제1 기준값은 20일, 제2 기준값은 15일이라고 한다면, 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 전체 일수는 60일이다. 제1 기준값을 기준으로 제1 시계열 분석 데이터와 제2 시계열 분석 데이터를 구분한다면 1일부터 20일, 21일부터 40일, 41일부터 60일일 수 있으며, 제2 기준값을 기준으로 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 생성하면 1일부터 15일까지, 21일부터 35일까지, 41일부터 60일까지 일 수 있다. 이처럼 제1 기준값이 제2 기준값 보다 큰 값일 경우 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터에는 16일부터 20일까지, 36일부터 40일까지, 56일부터 60일까지의 일자에 속하는 제1 시계열 분석 데이터와 제2 시계열 분석 데이터 값이 포함되지 않을 수 있다. 따라서 제1 기준값은 제2 기준값 보다 작을 수 있다.For example, if the first reference value is 20 days and the second reference value is 15 days, the total number of days of the first time series analysis data and the second time series analysis data is 60 days. If the first time series analysis data and the second time series analysis data are classified based on the first reference value, the first time series analysis may be from the 1st to the 20th, the 21st to the 40th, and the 41st to 60 days, and the first time series analysis based on the second reference value. Sub-data and second time series analysis When sub-data is generated, it may be from the 1st to the 15th, the 21st to the 35th, and the 41st to the 60th. In this way, when the first reference value is greater than the second reference value, the first time series analysis sub-data and the second time series analysis sub-data belong to dates from the 16th to the 20th, the 36th to the 40th, and the 56th to the 60th. The first time series analysis data and the second time series analysis data values may not be included. Therefore, the first reference value may be smaller than the second reference value.

구간 분할부(230)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 -1.0 이상, 1.0 이하의 정규화된 값으로 생성한다.The section dividing unit 230 generates the first time series analysis sub-data and the second time series analysis sub-data as normalized values of -1.0 or more and 1.0 or less.

구간 분할부(230)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 정규화된 값으로 생성할 수 있다. 구간 분할부(230)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 선형 스케일링할 수 있다.The section dividing unit 230 may generate the first time series analysis sub-data and the second time series analysis sub-data as normalized values. The section dividing unit 230 may linearly scale the first time series analysis sub-data and the second time series analysis sub-data.

예를 들면, cur_date는 제1 기준값일 수 있으며, A는 제1 시계열 분석 서브 데이터일 수 있다. 또한, B는 제2 시계열 분석 서브 데이터 일 수 있다. 제1 시계열 분석 서브 데이터를 선형 스케일링 하는 식은 아래와 같을 수 있다.For example, cur_date may be a first reference value, and A may be a first time series analysis sub-data. Also, B may be the second time series analysis sub-data. An equation for linearly scaling the first time series analysis sub-data may be as follows.

[제1 시계열 분석 서브 데이터 정규화 식 예시][Example of the first time series analysis sub-data normalization expression]

[제2 시계열 분석 서브 데이터 정규화 식 예시][Example of the second time series analysis sub-data normalization expression]

제1 생성자(240)는 복수의 제1 시계열 분석 서브 데이터를 입력 데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 변환 데이터를 생성한다.The first generator 240 generates a plurality of first transformed data corresponding to attributes of the plurality of second time series analysis sub-data by using the plurality of first time series analysis sub-data as input data.

제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터는 각각의 속성 또는 특성이 있을 수 있다. 제1 생성자(240)는 복수의 제1 시계열 분석 서브 데이터를 입력 데이터로 할 수 있으며, 제1 시계열 분석 서브 데이터에 제2 시계열 분석 서브 데이터의 특성 또는 속성을 반영하여 복수의 제1 변환 데이터를 생성할 수 있다. 제1 생성자(240)에서 복수의 제1 변환 데이터를 생성할 때 제1 생성자(240)의 손실 함수(Loss Function) 값이 발생할 수 있으며, 제1 생성자(240)의 손실 함수 값은 제1 변환 데이터와 복수의 제2 시계열 분석 서브 데이터의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.The first time series analysis sub-data and the second time series analysis sub-data may have respective attributes or characteristics. The first generator 240 may take a plurality of first time series analysis sub-data as input data, and reflect the characteristics or attributes of the second time series analysis sub-data in the first time series analysis sub-data to apply a plurality of first transformed data. Can be generated. When the first generator 240 generates a plurality of first transformed data, a loss function value of the first generator 240 may occur, and the loss function value of the first generator 240 is a first transform It may be a value representing an error degree or difference degree between the data and the plurality of second time series analysis sub-data as a value.

제2 생성자(250)는 생성된 복수의 제1 변환 데이터를 입력 데이터로 하여 복수의 제1 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 재구성 데이터를 생성한다.The second generator 250 generates a plurality of first reconstructed data corresponding to attributes of the plurality of first time series analysis sub-data by using the generated plurality of first transform data as input data.

제1 시계열 분석 서브 데이터는 속성 또는 특성이 있을 수 있으며, 제2 생성자(250)는 제1 생성자(240)에서 생성한 복수의 제1 변환 데이터를 입력 데이터로 할 수 있다. 제2 생성자(250)는 복수의 제1 시계열 분석 데이터의 특성을 입력 받은 제1 변환 데이터에 반영하여 복수의 제1 재구성 데이터를 생성할 수 있다. 제2 생성자(250)에서 복수의 제1 재구성 데이터를 생성할 때 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값이 발생할 수 있으며, 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값은 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 변환 데이터의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.The first time series analysis sub-data may have attributes or characteristics, and the second generator 250 may use a plurality of first transformed data generated by the first generator 240 as input data. The second generator 250 may generate a plurality of first reconstructed data by reflecting the characteristics of the plurality of first time series analysis data to the received first transformed data. When the second generator 250 generates the plurality of first reconstructed data, a loss function value may occur between the plurality of first time series analysis sub-data and the plurality of first reconstructed data, and the plurality of first reconstructed data The loss function value between the time series analysis sub-data and the plurality of first reconstruction data may be a value representing an error degree or a difference degree between the plurality of first time series analysis sub-data and the plurality of first transformed data. .

구별자(260)는 제1 생성자(240)에서 생성된 복수의 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 입력 값으로 하여 제1 변환 데이터를 구별한다.The distinguisher 260 distinguishes the first transformed data by using a plurality of first transformed data and a plurality of second time series analysis sub-data generated by the first generator 240 as input values.

구별자(260)는 제1 생성자(240)에서 생성된 복수의 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 입력 값으로 할 수 있다. 구별자(260)는 제1 변환 데이터가 제1 변환 데이터에 가까운지 또는 제2 시계열 분석 서브 데이터에 가까운지를 구분할 수 있으며, 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 구분할 때 구별자(260)의 손실 함수(Loss Function) 값이 발생할 수 있다. 구별자(260)의 손실 함수 값은 제1 변환 데이터와 제2 시계열 분석 서브 데이터 사이의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.The distinguisher 260 may use a plurality of first transform data and a plurality of second time series analysis sub-data generated by the first generator 240 as input values. The distinguisher 260 may distinguish whether the first transformed data is close to the first transformed data or the second time-series analysis sub-data, and when distinguishing the first transformed data and the plurality of second time-series analysis sub-data, the discriminator A loss function value of 260 may occur. The loss function value of the discriminator 260 may be a value representing an error degree or a difference degree between the first transformed data and the second time series analysis sub-data as a value.

유사도 계산부(270)는 제1 생성자(240)의 손실 함수(Loss Function) 값인 생성자 손실 값(Loss), 구별자(260)의 손실 함수(Loss Function) 값인 구별자(260) 손실 값(Loss) 및 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값인 사이클 손실 값(Loss)을 구하고, 이를 기초로 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산한다.The similarity calculation unit 270 includes a generator loss value (Loss), which is a loss function value of the first generator 240, and a discriminator 260, which is a loss function value of the distinguisher 260. ) And a cycle loss value (Loss), which is a loss function value between a plurality of first time series analysis sub-data and a plurality of first reconstruction data, and based on this, the similarity between the first time series data and the second time series data Calculate

유사도 계산부(270)는 생성자 손실 값, 구별자(260) 손실 값, 사이클 손실 값을 기초로 제1 시계열 데이터와 제2 시계열 데이터 사이의 유사도를 계산할 수 있다. 유사도 값은 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 유사한 정도를 측정하는 척도일 수 있으며, 유사도가 낮을수록 비교대상이 되는 제1 시계열 분석 데이터와 제2 시계열 데이터의 차이가 크지 않다는 것으로 계산된 유사도가 낮을수록 유사성이 높은 것으로 확인할 수 있다. 반대로 유사도가 높을수록 비교대상이 되는 제1 시계열 분석 데이터와 제2 시계열 데이터의 차이가 크다는 것으로 계산된 유사도가 높을수록 유사성이 낮은 것으로 확인할 수 있다. 유사도 계산부(270)는 생성자 손실 값, 구별자(260) 손실 값, 사이클 손실 값을 기초로 제1 시계열 데이터와 제2 시계열 데이터 사이의 유사도를 계산할 수 있으며, 생성자 손실 값, 구별자(260) 손실 값, 사이클 손실 값 각각에 가중치를 부여하여 유사도를 계산할 수 있다.The similarity calculation unit 270 may calculate a similarity between the first time series data and the second time series data based on the generator loss value, the discriminator 260 loss value, and the cycle loss value. The similarity value may be a measure of the degree of similarity between the first time series analysis data and the second time series analysis data, and the lower the similarity is, the less the difference between the first time series analysis data and the second time series data to be compared is calculated. It can be seen that the lower the similarity is, the higher the similarity is. Conversely, the higher the similarity, the greater the difference between the first time series analysis data and the second time series data to be compared. The higher the calculated similarity, the lower the similarity. The similarity calculation unit 270 may calculate the similarity between the first time series data and the second time series data based on the generator loss value, the discriminator 260 loss value, and the cycle loss value. ) The similarity can be calculated by assigning a weight to each of the loss value and the cycle loss value.

유사도 계산부(270)는 제1 시계열 데이터 및 제2 시계열 데이터를 반대로 입력 받은 경우의 생성자 손실 값, 구별자(260) 손실 값 및 사이클 손실 값을 각각 더 구하고, 구해진 6개의 손실 값을 이용하여 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산한다.The similarity calculation unit 270 further obtains a generator loss value, a discriminator 260 loss value, and a cycle loss value when the first time series data and the second time series data are input in reverse, and uses the obtained six loss values. The degree of similarity between the first time series data and the second time series data is calculated.

제1 시계열 데이터 및 제2 시계열 데이터를 반대로 입력 받은 경우는 구간 분할부(230)에서 생성한 복수의 제1 시계열 분석 서브 데이터와 복수의 제2 시계열 분석 서브 데이터 중 제2 시계열 분석 서브 데이터를 입력데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 변환 데이터를 제1 생성자(240)에서 생성할 수 있으며, 제1 시계열 분석 서브 데이터의 속성에 맞는 제1 변환 데이터를 입력데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 재구성 데이터를 제2 생성자(250)에서 생성할 수 있다. 또한, 제1 생성자(240)에서 생성된 복수의 제1 변환 데이터 및 복수의 제1 시계열 분석 서브 데이터를 입력 값으로 하여 제1 변환 데이터를 구별하는 구별자(260)를 생성할 수 있다. 제1 생성자(240)에 기준으로 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우와 제 2 시계열 분석 서브 데이터를 입력 데이터를 한 경우 유사도 계산부(270)는 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우의 생성자 손실값, 구별자(260) 손실값, 사이클 손실값을 각각 구할 수 있으며, 제2 시계열 분석 서브 데이터를 입력 데이터로 한 경우의 생성자 손실값, 구별자(260) 손실값, 사이클 손실값을 각각 구할 수 있다. 이렇게 구한 6개의 손실값을 이용하여 제1 시계열 데이터와 제2 시계열 데이터의 유사도를 계산할 수 있다.When the first time series data and the second time series data are input in reverse, the second time series analysis sub data among the plurality of first time series analysis sub data and the plurality of second time series analysis sub data generated by the section division unit 230 are input. As data, a plurality of first transformed data matching the attributes of the plurality of second time-series analysis sub-data can be generated by the first generator 240, and first transformed data matching the attributes of the first time-series analysis sub-data are input. The second generator 250 may generate a plurality of first reconstructed data corresponding to the attributes of the plurality of second time series analysis sub-data as data. In addition, a distinguisher 260 for distinguishing the first transformed data may be generated by using a plurality of first transformed data and a plurality of first time series analysis sub-data generated by the first generator 240 as input values. When the first time series analysis sub data is used as input data and the second time series analysis sub data is input data based on the first generator 240, the similarity calculation unit 270 inputs the first time series analysis sub data as input data. The generator loss value, the discriminator 260 loss value, and the cycle loss value can be obtained respectively, and the generator loss value, the discriminator 260 loss value when the second time series analysis sub-data is input data, Each cycle loss value can be calculated. The similarity between the first time series data and the second time series data may be calculated using the six loss values obtained as described above.

유사도 계산부(270)는 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우 구한 생성자 손실값과 제2 시계열 분석 서브 데이터를 입력 데이터로 한 경우에서 발생한 생성자 손실값의 평균 값, 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우 구한 구별자(260) 손실값과 제2 시계열 분석 서브 데이터를 입력 데이터로 한 경우에서 발생한 구별자(260) 손실값의 평균 값, 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우 구한 사이클 손실값과 제2 시계열 분석 서브 데이터를 입력 데이터로 한 경우에서 발생한 사이클 손실값의 평균 값을 기초로 제1 시계열 데이터와 제2 시계열 데이터의 유사도를 계산할 수 있다.The similarity calculation unit 270 includes an average value of a generator loss value obtained when the first time series analysis sub-data is used as input data, a generator loss value obtained when the second time series analysis sub-data is used as input data, and a first time series analysis sub-data. When data is used as input data, the loss value of the discriminator 260 obtained when the second time series analysis sub-data is used as the input data, the average value of the loss value and the first time series analysis sub-data are input data In the case of, the similarity between the first time series data and the second time series data may be calculated based on the average value of the calculated cycle loss value and the cycle loss value generated when the second time series analysis sub-data is used as input data.

예를 들면, 유사도 계산부(270)는 제1 시계열 데이터의 생성자 손실 값, 구별자(260) 손실 값, 사이클 손실 값을 입력 받을 수 있으며, 제2 시계열 데이터의 생성자 손실 값, 구별자(260) 손실 값, 사이클 손실 값을 입력 받을 수 있다. 총 6개의 손실값을 기초로 제1 시계열 데이터와 제2 시계열 데이터의 유사도를 계산할 수 있다.For example, the similarity calculation unit 270 may receive the generator loss value, the discriminator 260 loss value, and the cycle loss value of the first time series data, and the generator loss value and the discriminator 260 of the second time series data. ) Loss value and cycle loss value can be input. A degree of similarity between the first time series data and the second time series data may be calculated based on a total of six loss values.

도 3은 본 발명의 일실시예에 따른 시계열 데이터를 기반으로 제1 변환 이미지와 제2 시계열 분석 서브 데이터로 구분하는 일례를 도시한 도면이다.3 is a diagram illustrating an example of dividing a first transformed image into a second time series analysis sub-data based on time series data according to an embodiment of the present invention.

예를 들면, 제1 생성자(240)는 제2 시계열 분석 서브 데이터를 참고하여 제1 시계열 분석 서브 데이터를 제2 시계열 분석 서브 데이터의 특성에 맞게 변환하여 제1 변환 데이터를 생성할 수 있으며, 제1 변환 데이터를 생성할 때 생성자 손실 값이 발생할 수 있다. 제2 생성자(250)는 제1 시계열 분석 서브 데이터를 참고하여 제1 생성자(240)에서 생성한 제1 변환 데이터를 제1 시계열 분석 서브 데이터의 특성에 맞게 제1 재구성 데이터를 생성할 수 있으며, 제1 재구성 데이터를 생성할 때 사이클 손실 값이 발생할 수 있다. 구별자(260)는 제1 변환 데이터와 제2 시계열 분석 서브 데이터를 입력 받아 둘 중에서 어떤 데이터가 제1 변환 데이터 인지 구별할 수 있으며, 데이터를 구별할 때 구별자(260) 손실 값이 발생할 수 있다.For example, the first generator 240 may generate the first transformed data by converting the first time series analysis sub-data according to the characteristics of the second time series analysis sub-data by referring to the second time series analysis sub-data. 1 Constructor loss values may occur when generating converted data. The second generator 250 may generate first reconstructed data according to the characteristics of the first time series analysis sub-data by referring to the first time series analysis sub-data and the first transformed data generated by the first generator 240, A cycle loss value may occur when generating the first reconstructed data. The distinguisher 260 receives the first transformed data and the second time series analysis sub-data and can distinguish which data is the first transformed data, and when discriminating the data, a loss value of the discriminator 260 may occur. have.

도 4는 본 발명의 일실시예에 따른 제1 기준값 및 제2 기준값을 기초로 제2 시계열 분석 서브 데이터를 생성하는 일례를 도시한 도면이다.4 is a diagram illustrating an example of generating second time series analysis sub-data based on a first reference value and a second reference value according to an embodiment of the present invention.

예를 들면, 제1 기준값은 5일, 제2 기준값은 40일이라고 한다면, 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 전체 일수는 60일이다. 구간 분할부(230)는 제1 기준값을 기준으로 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 시작일인 첫날을 기준으로 60일을 각각 5일씩 증가하면서 구분할 수 있으며, 첫날인 1일부터 5일까지, 6일부터 10일까지, 11일부터 15일까지, 16일부터 20일까지, 21일부터 25일까지, 26일부터 30일까지 31일부터 35일까지, 36일부터 40일까지, 41일부터 45일까지, 46일부터 50일까지, 51일부터 55일까지, 56일부터 60일까지로 구분할 수 있다. 또한, 구간 분할부(230)는 제1 기준값으로 구분한 데이터를 기준으로 제2 기준값 만큼씩 길이로 서브 데이터를 생성하면, 1일부터 40일까지, 6일부터 45일까지, 11일부터 50일까지, 16일부터 55일까지, 21일부터 60일까지로 구분하여 제1 시계열 분석 서브 데이터와 제2 시계열 분석 서브 데이터를 생성할 수 있다.For example, if the first reference value is 5 days and the second reference value is 40 days, the total number of days of the first time series analysis data and the second time series analysis data is 60 days. The section dividing unit 230 may divide the first time series analysis data and the second time series analysis data from the first day, which is the start date of the second time series analysis data, to increase by 5 days, respectively, based on the first reference value. Until, 6 to 10, 11 to 15, 16 to 20, 21 to 25, 26 to 30, 31 to 35, 36 to 40, It can be divided into 41 to 45 days, 46 to 50 days, 51 to 55 days, and 56 to 60 days. In addition, when the section dividing unit 230 generates sub-data with a length of the second reference value based on the data divided by the first reference value, the 1st to 40th, 6th to 45th, 11th to 50th The first time series analysis sub-data and the second time series analysis sub-data may be generated by dividing into days, from the 16th to the 55th, and from the 21st to the 60th.

도 5는 본 발명의 일실시예에 따른 시계열 데이터의 유사도 계산 방법의 순서도를 도시한 도면이다.5 is a flowchart illustrating a method of calculating the similarity of time series data according to an embodiment of the present invention.

이상의 시계열 데이터 유사도 계산 시스템은 시계열 데이터 유사도 계산 방법으로 구현되어 실현될 수 있으며, 상기 시계열 데이터 유사도 계산 시스템의 기술적인 사상을 그대로 적용할 수 있다.The above time-series data similarity calculation system may be implemented and realized by a time-series data similarity calculation method, and the technical idea of the time-series data similarity calculation system may be applied as it is.

이와 같은, 시계열 데이터 유사도 계산 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.Such a method of calculating the similarity of time series data may be implemented as an application or in the form of program instructions that may be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, and the like alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded in the computer-readable recording medium may be specially designed and constructed for the present invention, and may be known and usable to those skilled in the computer software field.

중앙처리장치 및 메모리를 구비하는 시계열 데이터 유사도 계산 시스템에서 동작하는 시계열 데이터 유사도 계산 방법에 있어서, 시계열 데이터를 수신하는 단계(S501)는 시계열 데이터 수신부(210)에서 제1 종목에 대한 제1 시계열 데이터 및 제2 종목에 대한 제2 시계열 데이터를 수신한다.In a time series data similarity calculation method operating in a time series data similarity calculation system including a central processing unit and a memory, the step of receiving time series data (S501) comprises: first time series data for a first item by the time series data receiving unit 210 And second time series data for the second item.

시계열 데이터를 가공하는 단계(S502)는 시계열 데이터 가공부(220)에서 수신한 제1 시계열 데이터 및 제2 시계열 데이터를 가공하여 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 생성한다.In the processing of the time series data (S502), the first time series data and the second time series data received by the time series data processing unit 220 are processed to generate first time series analysis data and second time series analysis data.

시계열 분석 데이터는 시계열 데이터가 가공된 데이터일 수 있다. 시계열 데이터를 가공하는 단계(S502)는 제1 종목에 대한 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 분석할 수 있는 데이터의 형태로 가공할 수 있다. 제1 시계열 분석 데이터는 제1 시계열 데이터를 가공한 데이터 일 수 있으며, 제2 시계열 분석 데이터는 제2 시계열 데이터를 가공한 데이터 일 수 있다.The time series analysis data may be data obtained by processing time series data. In step S502 of processing the time series data, the first time series analysis data and the second time series analysis data for the first stock may be processed into a form of data capable of analysis. The first time series analysis data may be data obtained by processing the first time series data, and the second time series analysis data may be data obtained by processing the second time series data.

시계열 데이터를 가공하는 단계(S502)는 제1 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 제1 종목의 일별 누적 수익률 값으로 제1 시계열 분석 데이터를 생성하고, 제2 시계열 데이터의 첫번째 일별 가격정보를 기초로 계산된 제1 종목의 일별 누적 수익률 값으로 제2 시계열 분석 데이터를 생성한다.In the step of processing time series data (S502), the first time series analysis data is generated using the cumulative daily return value of the first stock calculated based on the first daily price information of the first time series data, and the first daily price of the second time series data The second time series analysis data is generated using the daily cumulative return value of the first stock calculated based on the information.

제1 시계열 데이터의 첫번째 일별 가격정보는 제1 시계열 데이터가 시작되는 첫번째 날의 가격정보 일 수 있다. 시계열 데이터를 가공하는 단계(S502)는 제1 시계열 데이터의 1일째 되는 날인 첫번째 일별 가격정보를 기준으로 두 번째 일자부터 마지막 일자까지 일별 누적 수익률을 계산하고 그 값으로 제1 시계열 분석 데이터를 생성할 수 있다. 제2 시계열 데이터의 첫번째 일별 가격정보는 제2 시계열 데이터가 시작되는 첫번째 날의 가격정보 일 수 있다. 시계열 데이터를 가공하는 단계(S502)는 제2 시계열 데이터를 기준으로 첫번째 날의 가격정보에 대한 일별 누적 수익률 값은 0일 수 있으며, 두 번째 날부터 마지막 날까지의 일별 누적 수익률 값으로 제2 시계열 분석 데이터를 생성할 수 있다. 시계열 데이터를 가공하는 단계(S502)는 제1 종목 및 제2 종목의 일별 가격정보를 기초로 특정 시작시점이 되는 첫번째 일별 가격정보로 누적수익률을 계산할 수 있다.The first daily price information of the first time series data may be price information of the first day when the first time series data starts. In the step of processing time series data (S502), the cumulative daily rate of return is calculated from the second date to the last date based on the price information for the first day, which is the first day of the first time series data, and generates first time series analysis data with the value. I can. The first daily price information of the second time series data may be price information of the first day when the second time series data starts. In the step of processing the time series data (S502), the cumulative daily return value for the price information of the first day may be 0 based on the second time series data, and the cumulative daily return value from the second to the last day is used as the second time series. Analytical data can be generated. In the step of processing the time series data (S502), the cumulative rate of return may be calculated from the first daily price information, which is a specific starting point, based on the daily price information of the first stock and the second stock.

구간을 분할하는 단계(S503)는 구간 분할부(230)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 각각 복수의 구간으로 분할하여 복수의 제1 시계열 분석 서브 데이터와 복수의 제2 시계열 분석 서브 데이터를 생성한다.In the step of dividing the section (S503), the section dividing unit 230 divides the first time series analysis data and the second time series analysis data into a plurality of sections, respectively, to analyze a plurality of first time series analysis sub-data and a plurality of second time series analysis data. Create sub data.

구간을 분할하는 단계(S503)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 복수의 구간으로 분할할 수 있으며, 분할한 제1 시계열 분석 데이터는 제1 시계열 분석 서브 데이터로 생성할 수 있으며, 분할할 제2 시계열 분석 데이터는 제2 시계열 분석 서브 데이터로 생성할 수 있다. 구간을 분할하는 단계(S503)는 정해진 기준값을 기준으로 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 각각 분할할 수 있으며, 분할한 제1 시계열 분석 데이터는 제1 시계열 분석 서브 데이터로 생성할 수 있으며, 분할할 제2 시계열 분석 데이터는 제2 시계열 분석 서브 데이터로 생성할 수 있다.In the step of dividing the section (S503), the first time series analysis data and the second time series analysis data may be divided into a plurality of sections, and the divided first time series analysis data may be generated as first time series analysis sub-data, The second time series analysis data to be divided may be generated as second time series analysis sub-data. In the step of dividing the section (S503), the first time series analysis data and the second time series analysis data may be divided based on a predetermined reference value, and the divided first time series analysis data may be generated as first time series analysis sub-data. In addition, the second time series analysis data to be divided may be generated as second time series analysis sub-data.

구간을 분할하는 단계(S503)는 제1 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제1 시계열 분석 서브 데이터를 생성하고, 제2 시계열 분석 데이터의 첫날부터 제1 기준값 만큼씩 증가한 날짜를 각 구간의 시작일자로 정하고, 각각의 시작일자로부터 제2 기준값 만큼씩의 길이로 제2 시계열 분석 서브 데이터를 생성한다.In the step of dividing the section (S503), a date that increases by the first reference value from the first day of the first time series analysis data is determined as the start date of each section, and the first time series is analyzed with a length by the second reference value from each start date. Sub-data is generated, and the date that increases by the first reference value from the first day of the second time series analysis data is set as the start date of each section, and the second time series analysis sub-data is calculated with a length of the second reference value from each start date. Generate.

제1 기준값 및 제2 기준값은 시계열 분석 데이터를 구분할 수 있는 일수의 기준 값일 수 있다. 구간을 분할하는 단계(S503)는 시계열 분석 데이터의 첫날부터 마지막날까지 제1 기준값 만큼 증가한 일수씩 구분할 수 있다. 구간을 분할하는 단계(S503)는 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터를 제1 기준값으로 구분한 구간의 각각의 시작일을 제2 기준값의 시작일로 정할 수 있으며, 제1 기준값으로 구분한 구간의 각각의 시작일을 기준으로 제2 기준값 만큼씩 증가한 날짜로 각각 구분할 수 있다. 구간을 분할하는 단계(S503)는 제2 기준값으로 구분한 제1 시계열 분석 데이터 및 제2 시계열 분석 데이터는 각각 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터로 생성할 수 있다.The first reference value and the second reference value may be reference values of a number of days by which time series analysis data can be distinguished. In the step of dividing the section (S503), the number of days increased by the first reference value from the first day to the last day of the time series analysis data may be divided. In the step of dividing the section (S503), the start date of each of the sections in which the first time series analysis data and the second time series analysis data are divided by the first reference value may be determined as the start date of the second reference value, and the section divided by the first reference value Each start date of can be classified into a date that is increased by the second reference value. In the step of dividing the section (S503 ), the first time series analysis data and the second time series analysis data divided by the second reference value may be generated as first time series analysis sub-data and second time series analysis sub-data, respectively.

구간을 분할하는 단계(S503)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 -1.0 이상, 1.0 이하의 정규화된 값으로 생성한다.In the step of dividing the section (S503), the first time series analysis sub-data and the second time series analysis sub-data are generated with normalized values of -1.0 or more and 1.0 or less.

구간을 분할하는 단계(S503)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 정규화된 값으로 생성할 수 있다. 구간을 분할하는 단계(S503)는 제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터를 선형 스케일링할 수 있다.In the step of dividing the section (S503 ), the first time series analysis sub-data and the second time series analysis sub-data may be generated as normalized values. In the step of dividing the section (S503 ), the first time series analysis sub-data and the second time series analysis sub-data may be linearly scaled.

제1 변환 데이터를 생성하는 단계(S504)는 제1 생성자(240)에서 복수의 제1 시계열 분석 서브 데이터를 입력 데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 변환 데이터를 생성한다.In the step of generating the first transformed data (S504), the first generator 240 uses a plurality of first time-series analysis sub-data as input data, and a plurality of first transformed data matching the attributes of the plurality of second time-series analysis sub-data Create

제1 시계열 분석 서브 데이터 및 제2 시계열 분석 서브 데이터는 각각의 속성 또는 특성이 있을 수 있다. 제1 변환 데이터를 생성하는 단계(S504)는 복수의 제1 시계열 분석 서브 데이터를 입력 데이터로 할 수 있으며, 제1 시계열 분석 서브 데이터에 제2 시계열 분석 서브 데이터의 특성 또는 속성을 반영하여 복수의 제1 변환 데이터를 생성할 수 있다. 제1 변환 데이터를 생성하는 단계(S504)에서 복수의 제1 변환 데이터를 생성할 때 제1 변환 데이터를 생성하는 단계(S504)의 손실 함수(Loss Function) 값이 발생할 수 있으며, 제1 변환 데이터를 생성하는 단계(S504)의 손실 함수 값은 제1 변환 데이터와 복수의 제2 시계열 분석 서브 데이터의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.The first time series analysis sub-data and the second time series analysis sub-data may have respective attributes or characteristics. In the step of generating the first transformed data (S504), a plurality of first time series analysis sub-data may be used as input data, and a plurality of first time series analysis sub-data are applied by reflecting the characteristics or attributes of the second time series analysis sub-data. First transformed data may be generated. When generating a plurality of first transformed data in the step of generating the first transformed data (S504), a loss function value of the step of generating the first transformed data (S504) may occur, and the first transformed data The loss function value in the generating step S504 may be a value representing an error degree or a difference degree between the first transformed data and the plurality of second time series analysis sub-data.

제1 재구성 데이터를 생성하는 단계(S505)는 제2 생성자(250)에서 생성된 복수의 제1 변환 데이터를 입력 데이터로 하여 복수의 제1 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 재구성 데이터를 생성한다.In the step of generating the first reconstructed data (S505), a plurality of first reconstructed data suitable for the attributes of the plurality of first time series analysis sub-data by using the plurality of first transformed data generated by the second generator 250 as input data Create

제1 시계열 분석 서브 데이터는 속성 또는 특성이 있을 수 있으며, 제1 재구성 데이터를 생성하는 단계(S505)는 제1 변환 데이터를 생성하는 단계(S504)에서 생성한 복수의 제1 변환 데이터를 입력 데이터로 할 수 있다. 제1 재구성 데이터를 생성하는 단계(S505)는 복수의 제1 시계열 분석 데이터의 특성을 입력 받은 제1 변환 데이터에 반영하여 복수의 제1 재구성 데이터를 생성할 수 있다. 제1 재구성 데이터를 생성하는 단계(S505)에서 복수의 제1 재구성 데이터를 생성할 때 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값이 발생할 수 있으며, 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값은 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 변환 데이터의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.The first time series analysis sub-data may have attributes or characteristics, and in the step of generating the first reconstructed data (S505), the plurality of first transformed data generated in the step of generating the first transformed data (S504) are input data. It can be done with. In the step of generating the first reconstructed data (S505), the plurality of first reconstructed data may be generated by reflecting the characteristics of the plurality of first time series analysis data to the received first transformed data. When generating the plurality of first reconstructed data in the step of generating the first reconstructed data (S505), a loss function value may occur between the plurality of first time series analysis sub-data and the plurality of first reconstructed data, and , The loss function value between the plurality of first time series analysis sub-data and the plurality of first reconstruction data is the error degree or difference between the plurality of first time series analysis sub-data and the plurality of first transformed data as a value. May be the indicated value.

제1 변환 데이터를 구별하는 단계(S506)는 구별자(260)에서 제1 변환 데이터를 생성하는 단계(S504)에서 생성된 복수의 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 입력 값으로 하여 제1 변환 데이터를 구별한다.In the step of discriminating the first transformed data (S506), the plurality of first transformed data and the plurality of second time-series analysis sub-data generated in the step of generating the first transformed data in the discriminator 260 (S504) are input values. To distinguish the first converted data.

제1 변환 데이터를 구별하는 단계(S506)는 제1 변환 데이터를 생성하는 단계(S504)에서 생성된 복수의 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 입력 값으로 할 수 있다. 제1 변환 데이터를 구별하는 단계(S506)는 1 변환 데이터가 제1 변환 데이터에 가까운지 또는 제2 시계열 분석 서브 데이터에 가까운지를 구분할 수 있으며, 제1 변환 데이터 및 복수의 제2 시계열 분석 서브 데이터를 구분할 때 제1 변환 데이터를 구별하는 단계(S506)의 손실 함수(Loss Function) 값이 발생할 수 있다. 제1 변환 데이터를 구별하는 단계(S506)의 손실 함수 값은 제1 변환 데이터와 제2 시계열 분석 서브 데이터 사이의 오차 정도 또는 차이 정도를 값으로 나타낸 값 일 수 있다.In the step of discriminating the first transformed data (S506), the plurality of first transformed data and the plurality of second time-series analysis sub-data generated in the step of generating the first transformed data (S504) may be used as input values. In the step of discriminating the first transformed data (S506), it is possible to discriminate whether the first transformed data is close to the first transformed data or the second time series analysis sub-data, and the first transformed data and the plurality of second time-series analysis sub-data When distinguishing the first converted data, a loss function value of the step S506 of distinguishing the first converted data may occur. The loss function value in the step of discriminating the first transformed data (S506) may be a value representing an error degree or a difference degree between the first transformed data and the second time series analysis sub-data as a value.

유사도를 계산하는 단계(S507)는 유사도 계산부(270)에서 제1 변환 데이터를 생성하는 단계(S504)의 손실 함수(Loss Function) 값인 생성자 손실 값(Loss), 제1 변환 데이터를 구별하는 단계(S506)의 손실 함수(Loss Function) 값인 제1 변환 데이터를 구별하는 단계(S506) 손실 값(Loss) 및 복수의 제1 시계열 분석 서브 데이터와 복수의 제1 재구성 데이터 사이의 손실 함수(Loss Function) 값인 사이클 손실 값(Loss)을 구하고, 이를 기초로 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산한다.The step of calculating the similarity (S507) is a step of distinguishing the generator loss value (Loss) and the first transformed data, which are the Loss Function values of the step of generating the first transformed data in the similarity calculator 270 (S504). Distinguishing the first transformed data, which is a loss function value of (S506) (S506), a loss value (Loss) and a loss function between the plurality of first time series analysis sub-data and the plurality of first reconstructed data ), a cycle loss value (Loss) is calculated, and a similarity between the first time series data and the second time series data is calculated based on this.

유사도를 계산하는 단계(S507)는 생성자 손실 값, 제1 변환 데이터를 구별하는 단계(S506) 손실 값, 사이클 손실 값을 기초로 제1 시계열 데이터와 제2 시계열 데이터 사이의 유사도를 계산할 수 있다. 유사도 값은 제1 시계열 분석 데이터와 제2 시계열 분석 데이터의 유사한 정도를 측정하는 척도일 수 있으며, 유사도가 낮을수록 비교대상이 되는 제1 시계열 분석 데이터와 제2 시계열 데이터의 차이가 크지 않다는 것으로 계산된 유사도가 낮을수록 유사성이 높은 것으로 확인할 수 있다. 반대로 유사도가 높을수록 비교대상이 되는 제1 시계열 분석 데이터와 제2 시계열 데이터의 차이가 크다는 것으로 계산된 유사도가 높을수록 유사성이 낮은 것으로 확인할 수 있다. 유사도를 계산하는 단계(S507)는 생성자 손실 값, 제1 변환 데이터를 구별하는 단계(S506) 손실 값, 사이클 손실 값을 기초로 제1 시계열 데이터와 제2 시계열 데이터 사이의 유사도를 계산할 수 있으며, 생성자 손실 값, 제1 변환 데이터를 구별하는 단계(S506) 손실 값, 사이클 손실 값 각각에 가중치를 부여하여 유사도를 계산할 수 있다.In the calculating of the similarity (S507), the similarity between the first time series data and the second time series data may be calculated based on the generator loss value, the step of discriminating the first transformed data (S506), and the cycle loss value. The similarity value may be a measure of the degree of similarity between the first time series analysis data and the second time series analysis data, and the lower the similarity is, the less the difference between the first time series analysis data and the second time series data to be compared is calculated. It can be seen that the lower the similarity is, the higher the similarity is. Conversely, the higher the similarity, the greater the difference between the first time series analysis data and the second time series data to be compared. The higher the calculated similarity, the lower the similarity. In the calculating of the similarity (S507), the similarity between the first time series data and the second time series data may be calculated based on the generator loss value and the first transformed data (S506) loss value, and the cycle loss value. Distinguishing the generator loss value and the first transformed data (S506) The similarity may be calculated by assigning a weight to each of the loss value and the cycle loss value.

유사도를 계산하는 단계(S507)는 제1 시계열 데이터 및 제2 시계열 데이터를 반대로 입력 받은 경우의 생성자 손실 값, 제1 변환 데이터를 구별하는 단계(S506) 손실 값 및 사이클 손실 값을 각각 더 구하고, 구해진 6개의 손실 값을 이용하여 제1 시계열 데이터 및 제2 시계열 데이터 사이의 유사도를 계산한다.In the step of calculating the similarity (S507), the generator loss value when the first time series data and the second time series data are reversely input, and the step of discriminating the first transformed data (S506), further obtain a loss value and a cycle loss value, respectively, The similarity between the first time series data and the second time series data is calculated using the obtained six loss values.

제1 시계열 데이터 및 제2 시계열 데이터를 반대로 입력 받은 경우는 구간을 분할하는 단계(S503)에서 생성한 복수의 제1 시계열 분석 서브 데이터와 복수의 제2 시계열 분석 서브 데이터 중 제2 시계열 분석 서브 데이터를 입력데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 변환 데이터를 제1 변환 데이터를 생성하는 단계(S504)에서 생성할 수 있으며, 제1 시계열 분석 서브 데이터의 속성에 맞는 제1 변환 데이터를 입력데이터로 하여 복수의 제2 시계열 분석 서브 데이터의 속성에 맞는 복수의 제1 재구성 데이터를 제1 재구성 데이터를 생성하는 단계(S505)에서 생성할 수 있다. 또한, 제1 변환 데이터를 생성하는 단계(S504)에서 생성된 복수의 제1 변환 데이터 및 복수의 제1 시계열 분석 서브 데이터를 입력 값으로 하여 제1 변환 데이터를 구별하는 제1 변환 데이터를 구별하는 단계(S506)를 생성할 수 있다. 제1 변환 데이터를 생성하는 단계(S504)에 기준으로 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우와 제 2 시계열 분석 서브 데이터를 입력 데이터를 한 경우 유사도를 계산하는 단계(S507)는 제1 시계열 분석 서브 데이터를 입력 데이터로 한 경우의 생성자 손실값, 제1 변환 데이터를 구별하는 단계(S506) 손실값, 사이클 손실값을 각각 구할 수 있으며, 제2 시계열 분석 서브 데이터를 입력 데이터로 한 경우의 생성자 손실값, 제1 변환 데이터를 구별하는 단계(S506) 손실값, 사이클 손실값을 각각 구할 수 있다. 이렇게 구한 6개의 손실값을 이용하여 제1 시계열 데이터와 제2 시계열 데이터의 유사도를 계산할 수 있다.When the first time series data and the second time series data are input in reverse, the second time series analysis sub data among the plurality of first time series analysis sub data and the plurality of second time series analysis sub data generated in the step of dividing the section (S503) As input data, a plurality of first transformed data matching the attributes of the plurality of second time-series analysis sub-data may be generated in the step of generating the first transformed data (S504), and the attribute of the first time-series analysis sub-data may be generated. A plurality of first reconstructed data matching the attributes of the plurality of second time series analysis sub-data may be generated in step S505 of generating the first reconstructed data by using the matching first transformed data as input data. In addition, a plurality of first transformed data and a plurality of first time series analysis sub-data generated in the step of generating the first transformed data (S504) are used as input values to distinguish the first transformed data for discriminating the first transformed data. Step S506 can be created. In the case of using the first time series analysis sub-data as input data and the second time series analysis sub-data as input data based on the step of generating the first transformed data (S504), the step of calculating the similarity (S507) includes the first When the time series analysis sub-data is used as input data, the generator loss value and the first conversion data are distinguished (S506). The loss value and the cycle loss value can be obtained, respectively, and the second time series analysis sub-data is used as input data. In step S506 of discriminating the generator loss value and the first converted data (S506), a loss value and a cycle loss value may be obtained, respectively. The similarity between the first time series data and the second time series data may be calculated using the six loss values obtained as described above.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CDROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CDROMs and DVDs, magnetic-optical media such as floptical disks. , And a hardware device specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있다.Although the above has been described with reference to embodiments, those skilled in the art may variously modify and change the present invention without departing from the spirit and scope of the present invention described in the following claims.

210: 시계열 데이터 수신부 220: 시계열 데이터 가공부
230: 구간 분할부 240: 제1 생성자
250: 제2 생성자 260: 구별자
270: 유사도 계산부210: time series data receiving unit 220: time series data processing unit
230: section division unit 240: first constructor
250: second constructor 260: distinguisher
270: similarity calculation unit

Claims

A time series data receiver configured to receive first time series data for a first item and second time series data for a second item;
A time series data processing unit processing the received first time series data and second time series data to generate first time series analysis data and second time series analysis data;
A section dividing unit for generating a plurality of first time series analysis sub-data and a plurality of second time series analysis sub-data by dividing the first time series analysis data and the second time series analysis data into a plurality of sections, respectively;
A first generator configured to generate a plurality of first transformed data according to attributes of the plurality of second time-series analysis sub-data by using the plurality of first time-series analysis sub-data as input data;
A second generator for generating a plurality of first reconstructed data corresponding to attributes of the plurality of first time series analysis sub-data by using the generated plurality of first transformed data as input data;
A discriminator for distinguishing first transformed data by using a plurality of first transformed data generated by the first generator and the plurality of second time-series analysis sub-data as input values; And
A generator loss value (Loss), which is a loss function value of the first generator, a discriminator loss value (Loss), which is a loss function value of the discriminator, and the plurality of first time series analysis sub-data and the plurality of A similarity calculation unit that calculates the similarity between the first time series data and the second time series data based on the cycle loss value Loss, which is a loss function value between the first reconstructed data
Time series data similarity calculation system comprising a.

The method of claim 1,
The first time series data and the second time series data each include daily price information of the first stock and daily price information of the second stock,
The time series data processing unit
The first time series analysis data is generated with a daily cumulative return value of the first stock calculated based on the first daily price information of the first time series data, and the first time series analysis data is calculated based on the first daily price information of the second time series data. Generating the second time series analysis data using the cumulative daily return value of the first stock
Time series data similarity calculation system, characterized in that.

The method of claim 2,
The section division unit
A date increased by a first reference value from the first day of the first time series analysis data is determined as a start date of each section, and first time series analysis sub-data is generated with a length by a second reference value from each start date,
A date increased by a first reference value from the first day of the second time series analysis data as a start date of each section, and generating second time series analysis sub-data with a length by a second reference value from each start date
Time series data similarity calculation system, characterized in that.

The method of claim 3,
The first reference value is less than the second reference value
Time series data similarity calculation system, characterized in that.

The method of claim 3,
The section division unit
Generating the first time series analysis sub-data and the second time series analysis sub-data with a normalized value of -1.0 or more and 1.0 or less
Time series data similarity calculation system, characterized in that.

The method of claim 1,
The similarity calculation unit
When the first time series data and the second time series data are input in reverse, a generator loss value, a discriminator loss value, and a cycle loss value are further calculated, and the first time series data and the second time series are obtained using the obtained six loss values. Calculating the similarity between data
Time series data similarity calculation system, characterized in that.

In the time series data similarity calculation method operating in a time series data similarity calculation system having a central processing unit and a memory,
Receiving time series data for receiving first time series data for a first stock and second time series data for a second stock;
Processing time series data for generating first time series analysis data and second time series analysis data by processing the received first time series data and second time series data;
Dividing the first time series analysis data and the second time series analysis data into a plurality of sections to generate a plurality of first time series analysis sub-data and a plurality of second time series analysis sub-data;
Generating a plurality of first transformed data corresponding to attributes of the plurality of second time-series analysis sub-data by using the plurality of first time-series analysis sub-data as input data;
Generating a plurality of first reconstructed data according to attributes of the plurality of first time series analysis sub-data by using the generated plurality of first transformed data as input data;
Distinguishing first transformed data by using a plurality of first transformed data generated in the step of generating the first transformed data and the plurality of second time-series analysis sub-data as input values; And
A generator loss value (Loss) that is a loss function value in the step of generating the first transformed data, a discriminator loss value (Loss) that is a loss function value in the discriminating step, and the plurality of first time series Calculating a cycle loss value (Loss) that is a loss function value between the analysis sub-data and the plurality of first reconstructed data, and calculating a similarity between the first time series data and the second time series data based on this
Time series data similarity calculation method comprising a.

The method of claim 7,
The first time series data and the second time series data each include daily price information of the first stock and daily price information of the second stock,
The step of processing the time series data
The first time series analysis data is generated with a daily cumulative return value of the first stock calculated based on the first daily price information of the first time series data, and the first time series analysis data is calculated based on the first daily price information of the second time series data. Generating the second time series analysis data using the cumulative daily return value of the first stock
Time series data similarity calculation method, characterized in that.

The method of claim 8,
The step of dividing the section is
A date increased by a first reference value from the first day of the first time series analysis data is determined as a start date of each section, and first time series analysis sub-data is generated with a length by a second reference value from each start date,
A date increased by a first reference value from the first day of the second time series analysis data as a start date of each section, and generating second time series analysis sub-data with a length by a second reference value from each start date
Time series data similarity calculation method, characterized in that.

The method of claim 9,
The first reference value is less than the second reference value
Time series data similarity calculation method, characterized in that.

The method of claim 9,
The step of dividing the section is
Generating the first time series analysis sub-data and the second time series analysis sub-data with a normalized value of -1.0 or more and 1.0 or less
Time series data similarity calculation method, characterized in that.

The method of claim 7,
The step of calculating the similarity is
When the first time series data and the second time series data are input in reverse, a generator loss value, a discriminator loss value, and a cycle loss value are further calculated, and the first time series data and the second time series are obtained using the obtained six loss values. Calculating the similarity between data
Time series data similarity calculation method, characterized in that.

A recording medium on which a program for executing the method of any one of claims 7 to 12 is recorded.