KR102368875B1

KR102368875B1 - Method for apparatus for visualizing dataset associations

Info

Publication number: KR102368875B1
Application number: KR1020210094083A
Authority: KR
Inventors: 신수미; 육진희; 황윤영; 문영수; 최기석
Original assignee: 한국과학기술정보연구원
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2022-03-02

Abstract

Provided are a method and apparatus for visualizing the relation between datasets. A method for visualizing the relation between datasets according to an embodiment of the present invention includes the steps of: analyzing a plurality of data sets and identifying a first data set and a second data set having a first association relationship; visualizing a first object indicating the first dataset and a second object indicating the second dataset, and connecting the first object and the second object with a connecting line as an expression of the first association relationship; determining and visualizing a height of the connecting line based on the strength of the first association relationship; analyzing a second association relationship between the first dataset and the second dataset; and determining and visualizing a color or thickness of the connecting line based on the strength of the second association relationship.

Description

Dataset association relationship visualization method and apparatus

본 발명은 데이터셋 연관 관계 가시화 방법 및 장치에 관한 것이다. 보다 자세하게는, 그래픽 요소를 이용하여 데이터셋들 간의 연관 관계를 표현하여 가시화하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for visualizing a data set association relationship. More particularly, it relates to a method and apparatus for expressing and visualizing a relationship between data sets using a graphic element.

데이터 시각화(Data Visualization)란 데이터 분석 결과를 사용자가 쉽게 이해할 수 있도록 시각적으로 표현하여 전달하는 것을 의미한다. 수많은 데이터를 한 장의 그램으로 요약한 인포그래픽(Infographics)이 데이터 시각화의 대표적인 방법이라고 할 수 있다.Data visualization means visually expressing and delivering data analysis results so that users can easily understand them. Infographics, which summarize a lot of data in one gram, can be said to be a representative method of data visualization.

최근에는 빅 데이터에 대한 관심이 높아지면서, 수많은 정보를 시각적으로 묘사하고 필요한 정보를 효율적이고 명확하게 제공하기 위한 빅 데이터 시각화의 필요성이 증가하고 있다. 그런데 시간 흐름에 따라 빅 데이터의 수집양을 막대 그래프로서 표현하거나, 빅 데이터의 수집 위치를 지도상에 표시하는 것과 같이 단편적인 정보에 대한 시각화만이 진행되고 있다. Recently, as interest in big data increases, the need for big data visualization to visually describe a lot of information and provide necessary information efficiently and clearly is increasing. However, according to the passage of time, only fragmentary information visualization is in progress, such as expressing the collection amount of big data as a bar graph or displaying the collection location of big data on a map.

한편, 빅 데이터는 다수의 데이터셋을 포함하고 있는데, 이러한 데이터셋의 연관 관계를 분석하는 시도가 부족한 실정이다. 또한, 빅 데이터에 포함된 데이터셋의 연관 관계를 어떠한 방법으로 가시화할 지에 대한 모델도 개발되지 않고 있다.On the other hand, big data includes a large number of datasets, and attempts to analyze the relationship between these datasets are insufficient. In addition, a model for how to visualize the correlation between datasets included in big data has not been developed.

한국공개특허 제10-2020-0102238호 (2020.08.31 공개)Korean Patent Publication No. 10-2020-0102238 (published on August 31, 2020)

본 발명이 해결하고자 하는 기술적 과제는 데이터셋들 간의 복수의 연관 관계를 다양한 그래픽 요소로 시각화하여 연관 관계 분석 결과를 직관적으로 이해시킬 수 있는 데이터셋 연관 관계 가시화 방법 및 장치를 제공하는 것이다.The technical problem to be solved by the present invention is to provide a method and apparatus for visualizing a relationship between datasets by visualizing a plurality of relationships between datasets with various graphic elements to intuitively understand the results of correlation analysis.

본 발명이 해결하고자 하는 다른 기술적 과제는, 데이터셋들의 다양한 연관 관계를 보다 정확하게 분석할 수 있는 데이터셋 연관 관계 가시화 방법 및 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a method and apparatus for visualizing data set associations that can more accurately analyze various associations of data sets.

본 발명이 해결하고자 하는 또 다른 기술적 과제는, 데이터셋들 간의 복수의 연관 관계를 하나의 도표로 시각화하는 데이터셋 연관 관계 가시화 방법 및 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a method and apparatus for visualizing data set associations for visualizing a plurality of associations between data sets in one chart.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 발명의 일 실시예에 따른 데이터셋 연관 관계 가시화 방법은, 복수의 데이터셋을 분석하여, 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 식별하는 단계와, 상기 제1 데이터셋을 가리키는 제1 객체와 상기 제2 데이터셋을 가리키는 제2 객체를 시각화하고, 상기 제1 연관 관계의 표현으로서 상기 제1 객체와 상기 제2 객체를 연결선으로 연결하는 단계와, 상기 제1 연관 관계의 강도에 기초하여 상기 연결선의 높이를 결정하여 시각화하는 단계와, 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계를 분석하는 단계와, 상기 제2 연관 관계의 강도에 기초하여 상기 연결선의 색상 또는 굵기를 결정하여 시각화하는 단계를 포함할 수 있다.In order to solve the above technical problem, a data set association relationship visualization method according to an embodiment of the present invention includes analyzing a plurality of data sets to identify a first data set and a second data set having a first association relationship Visualizing a first object pointing to the first dataset and a second object pointing to the second dataset, and connecting the first object and the second object with a connecting line as an expression of the first relation step, determining and visualizing the height of the connection line based on the strength of the first association relationship; analyzing a second association relationship between the first dataset and the second dataset; The method may include determining and visualizing a color or thickness of the connecting line based on the strength of the association relationship.

일 실시예에서, 상기 제1 연관 관계는 공간 기반의 연관 관계이고, 상기 방법은 상기 제1 데이터셋에 포함된 공간값과 상기 제2 데이터셋에 포함된 공간값 간의 거리를 계산하고, 상기 계산된 거리를 이용하여 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제1 연관 관계의 강도를 연산하는 단계를 더 포함할 수 있다.In an embodiment, the first association relationship is a spatial association relationship, and the method calculates a distance between a spatial value included in the first dataset and a spatial value included in the second dataset, and the calculation The method may further include calculating the strength of a first association relationship between the first dataset and the second dataset by using the distance.

일 실시예에서, 상기 제2 연관 관계는 시간 기반의 연관 관계이고, 상기 제2 연관 관계를 분석하는 단계는 상기 제1 데이터셋에 포함된 시간값과 상기 제2 데이터셋에 포함된 시간값 간의 차이를 계산하고, 상기 차이를 이용하여 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계의 강도를 연산하는 단계를 포함할 수 있다. In an embodiment, the second association relationship is a time-based association relationship, and the analyzing of the second association relationship is performed between a time value included in the first dataset and a time value included in the second dataset. The method may include calculating a difference and calculating a strength of a second association relationship between the first dataset and the second dataset using the difference.

일 실시예에서, 상기 제2 연관 관계는 항목 기반의 연관 관계이고, 상기 제2 연관 관계를 분석하는 단계는 상기 제1 데이터셋에 포함된 항목과 상기 제2 데이터셋에 포함된 항목 간의 일치 정도를 분석하고, 상기 분석된 일치 정보를 이용하여 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계의 강도를 연산하는 단계를 포함할 수 있다. In an embodiment, the second association relationship is an item-based association relationship, and analyzing the second association relationship includes a degree of matching between the items included in the first dataset and the items included in the second dataset. and calculating the strength of a second correlation between the first dataset and the second dataset by using the analyzed matching information.

일 실시예에서, 상기 방법은 복수의 데이터셋을 분석하여, 동일한 주제를 가지는 데이터셋들을 그룹핑하는 단계와, 상기 그룹핑된 데이터셋들을 가리키는 객체들을 동일한 그래픽 표현으로 시각화하는 단계를 더 포함할 수 있다. In an embodiment, the method may further include analyzing the plurality of datasets, grouping datasets having the same subject, and visualizing objects pointing to the grouped datasets in the same graphic representation. .

상기 연결선으로 연결하는 단계는, 링 형태의 도표에 상기 제1 객체와 상기 제2 객체를 시각화하는 단계를 포함할 수 있다. 상기 링 형태의 도표에는 각 객체가 등간격으로 배치될 수 있다.The connecting with the connecting line may include visualizing the first object and the second object on a ring-shaped diagram. In the ring-shaped diagram, each object may be arranged at equal intervals.

상기 기술적 과제를 해결하기 위한, 본 발명의 다른 실시예에 따른 가시화 모델링 장치는, 복수의 데이터셋을 저장부와, 상기 저장부에서 제1 데이터셋과 제2 데이터를 추출하고, 상기 추출한 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제1 연관 관계와 제2 연관 관계를 분석하는 연관 관계 분석부와 상기 제1 데이터셋을 가리키는 제1 객체와 상기 제2 데이터셋을 가리키는 제2 객체를 시각화하되, 상기 제1 연관 관계의 표현으로서 상기 제1 객체와 상기 제2 객체를 연결선으로 연결하고, 상기 제1 연관 관계의 강도에 기초하여 상기 연결선의 높이를 결정하며, 상기 제2 연관 관계의 강도에 기초하여 상기 연결선의 색상 또는 굵기를 결정하여 시각화하는 데이터셋 가시화부를 포함할 수 있다. In order to solve the above technical problem, a visualization modeling apparatus according to another embodiment of the present invention includes a storage unit for a plurality of datasets, extracts first and second data sets from the storage unit, and extracts the extracted second data from the storage unit. A correlation analysis unit that analyzes the first correlation and the second correlation between the first dataset and the second dataset, and visualizes the first object pointing to the first dataset and the second object pointing to the second dataset However, as an expression of the first association relationship, the first object and the second object are connected by a connecting line, the height of the connecting line is determined based on the strength of the first association relationship, and the strength of the second association relationship is It may include a data set visualization unit for visualizing by determining the color or thickness of the connecting line based on the.

상기 기술적 과제를 해결하기 위한, 본 발명의 또 다른 실시예에 따른 컴퓨팅 장치는, 하나 이상의 프로세서와, 상기 프로세서에 의하여 수행되는 프로그램을 로드(load)하는 메모리와, 상기 프로그램이 저장된 스토리지를 포함하되, 상기 프로그램은 복수의 데이터셋을 분석하여, 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 선별하는 동작과, 상기 제1 데이터셋을 가리키는 제1 객체와 상기 제2 데이터셋을 가리키는 제2 객체를 시각화하고, 상기 제1 연관 관계의 표현으로서 상기 제1 객체와 상기 제2 객체를 연결선으로 연결하는 동작과, 상기 제1 연관 관계의 강도에 기초하여 상기 연결선의 높이를 결정하여 시각화하는 동작과, 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계를 분석하는 동작과, 상기 제2 연관 관계의 강도에 기초하여 상기 연결선의 색상 또는 굵기를 결정하여 시각화하는 동작을 수행하기 위한 인스트럭션들(instructions)을 포함할 수 있다. In order to solve the above technical problem, a computing device according to another embodiment of the present invention includes one or more processors, a memory for loading a program executed by the processor, and a storage in which the program is stored. , the program analyzes a plurality of datasets, selects a first dataset and a second dataset having a first correlation, and separates a first object pointing to the first dataset and the second dataset Visualizing a pointing second object, connecting the first object and the second object with a connecting line as an expression of the first relation, and determining the height of the connecting line based on the strength of the first relation Visualizing, analyzing a second correlation between the first dataset and the second dataset, and visualizing by determining a color or thickness of the connecting line based on the strength of the second correlation It may include instructions to perform.

상기 기술적 과제를 해결하기 위한, 본 발명의 또 다른 실시예에 따른 명령어를 포함하는 컴퓨터 판독 가능한 비일시적 저장 매체에 있어서, 상기 명령어는 프로세서에 의해 실행될 때 상기 프로세서로 하여금, 복수의 데이터셋을 분석하여, 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 선별하는 단계와, 상기 제1 데이터셋을 가리키는 제1 객체와 상기 제2 데이터셋을 가리키는 제2 객체를 시각화하고, 상기 제1 연관 관계의 표현으로서 상기 제1 객체와 상기 제2 객체를 연결선으로 연결하는 단계와, 상기 제1 연관 관계의 강도에 기초하여 상기 연결선의 높이를 결정하여 시각화하는 단계와, 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계를 분석하는 단계와, 상기 제2 연관 관계의 강도에 기초하여 상기 연결선의 색상 또는 굵기를 결정하여 시각화하는 단계를 포함하는 동작들을 수행할 수 있다. In a computer-readable non-transitory storage medium including instructions according to another embodiment of the present invention for solving the above technical problem, the instructions cause the processor to analyze a plurality of data sets when executed by the processor to select a first dataset and a second dataset having a first correlation relationship, and visualize a first object indicating the first dataset and a second object indicating the second dataset, and 1 Connecting the first object and the second object with a connecting line as an expression of a relation, determining and visualizing the height of the connecting line based on the strength of the first relation, and the first dataset and analyzing a second correlation between the second data set and the second data set, and determining and visualizing the color or thickness of the connecting line based on the strength of the second correlation.

도 1은 본 발명의 일 실시예에 따른, 가시화 모델링 장치의 블록도를 나타내는 도면이다.
도 2는 본 발명의 다른 실시예에 따른, 데이터셋 연관 관계 가시화 방법의 순서도이다.
도 3은 도 2의 단계 S300을 자세하게 설명하기 위한 도면이다.
도 4는 도 2의 단계 S400을 자세하게 설명하기 위한 도면이다.
도 5는 도 2의 단계 S500을 자세하게 설명하기 위한 도면이다.
도 6은 데이터셋들의 관계가 시각화된 도표를 예시하는 도면이다.
도 7은 데이터셋들의 연관 관계 상세 정보를 예시하는 도면이다.
도 8은 다양한 실시예에서 컴퓨팅 장치를 구현할 수 있는 예시적인 하드웨어 구성도이다.1 is a diagram illustrating a block diagram of a visualization modeling apparatus according to an embodiment of the present invention.
2 is a flowchart of a method for visualizing a dataset relationship according to another embodiment of the present invention.
FIG. 3 is a view for explaining in detail step S300 of FIG. 2 .
FIG. 4 is a view for explaining in detail step S400 of FIG. 2 .
FIG. 5 is a view for explaining in detail step S500 of FIG. 2 .
6 is a diagram illustrating a diagram in which relationships between datasets are visualized.
7 is a diagram illustrating detailed information on the relationship between datasets.
8 is an exemplary hardware configuration diagram that may implement a computing device in various embodiments.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 발명의 기술적 사상을 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical spirit of the present invention is not limited to the following embodiments, but may be implemented in various different forms, and only the following embodiments complete the technical spirit of the present invention, and in the technical field to which the present invention belongs It is provided to fully inform those of ordinary skill in the art of the scope of the present invention, and the technical spirit of the present invention is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular. The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the components of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is “connected”, “coupled” or “connected” to another component, the component may be directly connected or connected to the other component, but another component is formed between each component. It should be understood that elements may also be “connected,” “coupled,” or “connected.”

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

본 명세서에서 데이터셋(data set)는 컴퓨터 장치에서 사용될 수 있는 데이터를 집합체일 수 있다.In the present specification, a data set may be an aggregate of data that can be used in a computer device.

이하, 도면들을 참조하여 본 발명의 몇몇 실시예들을 설명한다.Hereinafter, some embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른, 가시화 모델링 장치의 블록도를 나타내는 도면이다.1 is a diagram illustrating a block diagram of a visualization modeling apparatus according to an embodiment of the present invention.

도 1을 참조하면, 가시화 모델링 장치(10)는 데이터 수집부(11), 원본 데이터셋 저장부(12), 가공 데이터셋 저장부(13), 데이터 가공부(14), 연관 관계 분석부(15) 및 데이터셋 가시화부(16)를 포함할 수 있으며, 이러한 구성요소들은 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합을 통해서 구현될 수 있다. 상기 가시화 모델링 장치(10)는 후술하는 바와 같이, 메모리와 프로세서를 포함하는 컴퓨팅 장치로 구현될 수 있다.Referring to FIG. 1 , the visualization modeling apparatus 10 includes a data collection unit 11 , an original data set storage unit 12 , a processed data set storage unit 13 , a data processing unit 14 , and a correlation analysis unit ( 15) and a data set visualization unit 16, and these components may be implemented as hardware or software, or may be implemented through a combination of hardware and software. The visualization modeling apparatus 10 may be implemented as a computing device including a memory and a processor, as will be described later.

원본 데이터셋 저장부(12)는 스토리지와 같은 저장 수단으로서, 가공 처리되지 않은 다수의 원본 데이터셋을 저장할 수 있다. The original dataset storage unit 12 is a storage means such as storage, and may store a plurality of original datasets that have not been processed.

가공 데이터셋 저장부(13)는 스토리지와 같은 저장 수단으로서, 원본 데이터셋이 가공 처리된 가공 데이터셋을 저장할 수 있다. 부연하면, 가공 데이터셋 저장부(13)에 저장된 가공 데이터셋은, 원본 데이터셋 저장부(12)에 저장된 원본 데이터셋과 비교하여, 항목, 시간값, 공간값, 기관명 중에서 하나 이상이 상이할 수 있다. 또한, 원본 데이터셋의 항목들 중에서 일부 또는 전부가 가공 데이터셋에 포함될 수 있다. The processed data set storage unit 13 is a storage means such as storage, and may store the processed data set in which the original data set is processed. In other words, the processed data set stored in the processing data set storage unit 13 may be different from the original dataset stored in the original dataset storage unit 12, at least one of item, time value, space value, and organization name. can In addition, some or all of the items of the original dataset may be included in the processed dataset.

데이터 수집부(11)는 외부의 장치 또는 데이터베이스로부터 필요 데이터(예컨대, 데이터셋)를 수집하여 원본 데이터셋 저장부(12)에 저장할 수 있다. 데이터 수집부(11)는 다수의 데이터셋을 포함하는 빅 데이터를 수집할 수 있다. 상기 데이터셋은, 데이터셋의 제목, 하나 이상의 항목과 메타 데이터, 시간값 및 공간값을 포함할 수 있다. 여기서, 데이터셋의 제목은 데이터셋을 식별하기 위한 정보로서, 상기 데이터셋에 부여된 이름일 수 있다. 또한, 항목은 데이터가 기록되는 필드에 대한 제목으로서, 강수량, 강우량, 미세먼지 농도, 초미세먼지 농도, 이산화탄소 농도, 오존 농도, 교통량, 온도, 습도, 풍향, 풍속, 유속, 수문량 등일 수 있다. 또한, 메타 데이터는, 데이터셋을 설명하기 위한 데이터로서, 데이터셋의 제공 기관명, 데이터셋의 키워드, 데이터셋의 분류체계, 데이터셋의 설명 문구 중에서 하나 이상을 포함할 수 있다. 또한, 시간값은 데이터셋이 생성되는 일시이거나, 데이터셋이 생성되기까지의 소요된 시간범위일 수 있다. 또한, 공간값은 데이터셋이 수집된 장소 정보일 수 있다. 상기 공간값은 위경도 좌표, 행정 주소 등을 포함할 수 있다. The data collection unit 11 may collect necessary data (eg, a dataset) from an external device or database and store it in the original dataset storage unit 12 . The data collection unit 11 may collect big data including a plurality of datasets. The dataset may include a title of the dataset, one or more items, metadata, time values, and spatial values. Here, the title of the dataset is information for identifying the dataset, and may be a name given to the dataset. In addition, the item is a title for a field in which data is recorded, and can be precipitation, rainfall, fine dust concentration, ultrafine dust concentration, carbon dioxide concentration, ozone concentration, traffic volume, temperature, humidity, wind direction, wind speed, flow speed, hydrologic volume, etc. . In addition, the metadata is data for describing the dataset, and may include one or more of the name of the organization providing the dataset, the keyword of the dataset, the classification system of the dataset, and the description of the dataset. In addition, the time value may be a date and time when the dataset is created or a time range taken until the dataset is created. In addition, the spatial value may be information about a place where the data set is collected. The spatial value may include latitude and longitude coordinates, an administrative address, and the like.

데이터 가공부(14)는 원본 데이터셋 저장부(12)에 저장된 데이터셋에서 분석에 필요한 항목들만을 추출하고, 이렇게 추출된 항목들만을 포함하는 가공 데이터셋을 가공 데이터셋 저장부(13)에 저장할 수 있다. 데이터 가공부(14)는 원본 데이터셋 저장부(12)에 저장된 원본 데이터셋의 시간값, 공간값, 기관명, 항목명 중에서 하나 이상을 표준 포맷(또는 표준 명칭)으로 변경하여, 가공 데이터셋 저장부(13)에 저장할 수 있다. The data processing unit 14 extracts only the items necessary for analysis from the dataset stored in the original dataset storage unit 12 , and stores the processed data set including only the extracted items in the processing data set storage unit 13 . can be saved The data processing unit 14 changes one or more of the time value, space value, organization name, and item name of the original dataset stored in the original dataset storage unit 12 into a standard format (or standard name), and the processing dataset storage unit (13) can be stored.

연관 관계 분석부(15)는 가공 데이터셋 저장부(13)에 저장된 데이터셋들의 연관관계를 분석하고, 분석된 연관관계의 강도를 가공 데이터셋 저장부(13)에 저장할 수 있다. 연관 관계 분석부(15)는 후술하는 바와 같이, 데이터셋들을 공간, 시간, 항목 또는 주제에 따라 연관 관계를 분석하고, 그 분석된 결과(즉, 연관관계의 강도)를 가공 데이터셋 저장부(13)에 저장할 수 있다.The correlation analysis unit 15 may analyze the correlation between the datasets stored in the processed dataset storage unit 13 , and store the analyzed strength of the correlation in the processed dataset storage unit 13 . As will be described later, the correlation analysis unit 15 analyzes the correlation of the datasets according to space, time, item or topic, and stores the analyzed result (ie, the strength of the correlation) in the processed dataset storage unit ( 13) can be saved.

데이터셋 가시화부(16)는 데이터셋들의 다양한 연관 관계를 그래픽 요소를 이용하여 가시화하여 하나의 도표로서 디스플레이할 수 있다. 일 실시예에서, 데이터셋 가시화부(16)는 후술하는 바와 같이, 데이터셋들 중에서 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 선별한 후, 제1 데이터셋을 가리키는 제1 객체를 시각화하고, 제2 데이터셋을 가리키는 제1 객체를 시각화할 수 있다. 일 실시예에서, 데이터셋 가시화부(16)는 제1 객체와 제2 객체를 연결선으로 연결하고, 제1 데이터셋과 제2 데이터셋 간의 제1 연관 관계의 강도에 기초하여, 상기 연결선의 높이를 결정하여 시각화할 수 있다. 다른 실시예로서, 데이터셋 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계의 강도에 기초하여, 상기 연결선의 색상 또는 굵기를 결정하여 시각화할 수 있다.The data set visualization unit 16 may visualize various relationships of data sets using graphic elements and display them as one chart. In an embodiment, the dataset visualization unit 16 selects a first dataset and a second dataset having a first correlation from among datasets, and then selects the first dataset indicating the first dataset, as will be described later. The object may be visualized, and the first object pointing to the second dataset may be visualized. In an embodiment, the dataset visualization unit 16 connects the first object and the second object with a connection line, and based on the strength of the first association between the first dataset and the second dataset, the height of the connection line can be determined and visualized. As another embodiment, a color or thickness of the connection line may be determined and visualized based on the strength of a second correlation between the first dataset and the second dataset.

데이터셋 가시화부(16)는 상기 연관 관계 분석부(15)의 분석 결과 상기 제1 데이터셋과 상기 제2 데이터셋이 동일한 주제인 것으로 판정되면, 상기 제1 객체와 상기 제2 객체를 동일할 색상으로 시각화할 수 있으며, 상기 제1 데이터셋과 상기 제2 데이터셋이 동일하지 않은 주제로 판정되면 상기 제1 객체와 상기 제2 객체를 상이한 색상으로 시각화할 수 있다.When it is determined that the first dataset and the second dataset have the same subject as a result of the analysis by the correlation analysis unit 15, the dataset visualization unit 16 determines whether the first object and the second object are the same. Color may be visualized, and if it is determined that the first dataset and the second dataset are not the same subject, the first object and the second object may be visualized with different colors.

도 2는 본 발명의 다른 실시예에 따른, 데이터셋 연관 관계 가시화 방법의 순서도이다.2 is a flowchart of a method for visualizing a dataset relationship according to another embodiment of the present invention.

도 2에 도시된 방법의 각 단계는 컴퓨팅 장치에 의해 수행될 수 있다. 다시 말하면, 본 방법의 각 단계는 컴퓨팅 장치의 프로세서에 의해 실행되는 하나 이상의 인스트럭션들로 구현될 수 있다. 본 방법에 포함되는 제1 단계들은 제1 컴퓨팅 장치에 의하여 수행되고, 본 방법의 제2 단계들은 제2 컴퓨팅 장치에 의하여 수행될 수 있다. 이하에서는, 본 방법의 각 단계가 도 1을 참조하여 설명한 가시화 모델링 장치(10)에 의해 수행되는 것을 가정하여 설명을 이어가도록 하되, 각 단계의 수행 주제는 단지 예시일 뿐, 본 발명이 이하의 설명에 의해 한정되는 아니며, 설명의 편의를 위해 상기 방법에 포함되는 일부 단계의 동작 주제는 그 기재가 생략될 수도 있다.Each step of the method illustrated in FIG. 2 may be performed by a computing device. In other words, each step of the method may be implemented with one or more instructions executed by a processor of a computing device. The first steps included in the method may be performed by the first computing device, and the second steps of the method may be performed by the second computing device. Hereinafter, the description will be continued assuming that each step of the method is performed by the visualization modeling apparatus 10 described with reference to FIG. 1 , but the subject of each step is merely an example, and the present invention is It is not limited by the description, and for the convenience of description, the operation subject of some steps included in the method may be omitted.

도 2를 참조하면, 데이터 수집부(11)는 외부의 서버 또는 장치와 연동하여, 다수의 데이터셋을 포함하는 빅 데이터를 수집하고, 상기 다수의 데이터셋을 원본 데이터셋 저장부(12)에 저장할 수 있다(S100). 일 실시예에서, 데이터 수집부(11)는 빅 데이터를 수집할 수 있는 외부 서버의 주소를 미리 입력받을 수 있으며, 상기 주소에 해당하는 서버에 접근하여 빅 데이터를 수집할 수 있다. 또한, 데이터 수집부(11)는 데이터셋의 인덱스(index)를 할당한 후, 상기 인덱스를 포함하는 데이터셋을 원본 데이터셋 저장부(12)에 저장할 수 있다.Referring to FIG. 2 , the data collection unit 11 collects big data including a plurality of datasets by interworking with an external server or device, and stores the plurality of datasets in the original dataset storage unit 12 . It can be saved (S100). In an embodiment, the data collection unit 11 may receive an address of an external server capable of collecting big data in advance, and may collect big data by accessing a server corresponding to the address. Also, after allocating an index of the dataset, the data collection unit 11 may store the dataset including the index in the original dataset storage unit 12 .

이어서, 데이터 가공부(14)는 원본 데이터셋 저장부(12)에서 저장된 데이터셋들 중에서, 분석 대상이 되는 데이터를 추출할 수 있다(S200). 데이터 가공부(14)는 데이터셋에서, 데이터셋의 제목, 시간값, 공간값, 메타데이터 및 항목들을 분석 대상 데이터로서 추출할 수 있다. 즉, 데이터셋에 포함된 데이터 중에서, 분석에 이용되는 데이터만이 선별되어 추출될 수 있다.Subsequently, the data processing unit 14 may extract data to be analyzed from among the datasets stored in the original dataset storage unit 12 ( S200 ). The data processing unit 14 may extract the title, time value, spatial value, metadata, and items of the dataset from the dataset as analysis target data. That is, from among the data included in the dataset, only data used for analysis may be selected and extracted.

다음으로, 데이터 가공부(14)는 추출한 데이터 분석 대상 데이터 중에서, 시간값, 공간값, 기관명 및 항목명 중에서 하나 이상을 사전에 설정된 표준 포맷 또는 표준 명칭으로 가공 처리하여, 시간값, 공간값, 기관명 또는 항목명이 가공 처리되고 분석 대상 데이터를 포함하는 가공 데이터셋을 가공 데이터셋 저장부(13)에 저장할 수 있다(S300). 분석 대상 데이터를 가공 처리하여 가공 데이터셋을 저장하는 단계 S300에 대해서는, 도 3를 참조하여 보다 구체적으로 설명하기로 한다.Next, the data processing unit 14 processes at least one of a time value, a space value, an organization name, and an item name among the extracted data analysis target data into a preset standard format or standard name, and processes the time value, space value, and organization name Alternatively, the processing data set including the item name processed and the analysis target data may be stored in the processing data set storage unit 13 ( S300 ). The operation S300 of processing the analysis target data and storing the processing data set will be described in more detail with reference to FIG. 3 .

이어서, 연관 관계 분석부(15)는 가공 데이터셋 저장부(13)에 저장된 복수의 데이터셋들 간의 연관 관계를 분석하고, 분석된 연관 관계의 강도를 가공 데이터셋 저장부(13)에 저장할 수 있다(S400). 연관 관계 분석부(15)는 복수의 데이터셋들의 연관 관계를, 공간값, 시간값, 항목, 주제 중 하나 이상에 기초하여 분석할 수 있다. 데이터셋들의 연관 관계를 분석하는 단계 S400에 대해서는 도 4를 참조하여 보다 구체적으로 설명하기로 한다.Then, the correlation analysis unit 15 may analyze the correlation between the plurality of datasets stored in the processing data set storage unit 13 , and store the strength of the analyzed correlation in the processing dataset storage unit 13 . There is (S400). The correlation analysis unit 15 may analyze the correlation of the plurality of data sets based on one or more of a spatial value, a time value, an item, and a topic. The step S400 of analyzing the correlation between the datasets will be described in more detail with reference to FIG. 4 .

다음으로, 데이터셋 가시화부(16)는 데이터셋들 간의 연관 관계를 가공 데이터셋 저장부(13)에서 확인하고, 데이터셋들 간의 연관 관계를 다양한 그래픽 요소를 이용하여 하나의 도표에 시각화할 수 있다(S500). 후술하는 바와 같이, 데이터셋들 중에서 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 선별한 후, 제1 데이터셋을 가리키는 제1 객체를 시각화하고, 제2 데이터셋을 가리키는 제1 객체를 시각화할 수 있다. 일 실시예에서, 데이터셋 가시화부(16)는 제1 객체와 제2 객체를 연결선으로 연결하고, 제1 데이터셋과 제2 데이터셋 간의 제1 연관 관계의 강도에 기초하여, 상기 연결선의 높이를 결정하여 시각화할 수 있다. 다른 실시예로서, 데이터셋 가시화부(16)는 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계의 강도에 기초하여, 상기 연결선의 색상 또는 굵기를 결정하여 시각화할 수 있다. 데이터셋들의 연관 관계를 분석하는 단계 S500에 대해서는 도 5를 참조하여 보다 구체적으로 설명하기로 한다.Next, the dataset visualization unit 16 can check the correlation between the datasets in the processed data set storage unit 13, and visualize the correlation between the datasets in one chart using various graphic elements. There is (S500). As will be described later, after selecting a first dataset and a second dataset having a first correlation from among datasets, a first object pointing to the first dataset is visualized, and a first object pointing to the second dataset is selected. Objects can be visualized. In an embodiment, the dataset visualization unit 16 connects the first object and the second object with a connection line, and based on the strength of the first association between the first dataset and the second dataset, the height of the connection line can be determined and visualized. As another embodiment, the dataset visualization unit 16 may determine and visualize the color or thickness of the connection line based on the strength of the second correlation between the first dataset and the second dataset. The step S500 of analyzing the correlation between data sets will be described in more detail with reference to FIG. 5 .

본 실시예에 따르면, 데이터셋들 간의 다양한 연관 관계를 하나의 도표에 시각화하여 디스플레이함으로써, 사용자로 하여금 데이터셋들의 연관 관계를 직관적으로 이해시킬 수 있다. 이에 따라, 사용자들은 데이터셋의 동향이나 패턴을 용이하게 파악할 수 있으며, 또한 데이터셋에 내포된 메시지를 명확하게 이해할 수 있다.According to the present embodiment, by visualizing and displaying various correlations between datasets on one chart, the user can intuitively understand the correlations between datasets. Accordingly, users can easily understand the trend or pattern of the dataset, and also clearly understand the message contained in the dataset.

이하, 도 3을 참조하여, 도 2의 단계 S300에 대해서 자세하게 설명한다.Hereinafter, with reference to FIG. 3 , step S300 of FIG. 2 will be described in detail.

도 3을 참조하면, 데이터 가공부(14)는 추출된 분석 대상 데이터에서 하나 이상의 항목명을 식별할 수 있다(S305). 다음으로, 데이터 가공부(14)는 식별한 항목명이 사전에 설정된 표준 항목명인지 여부를 판정할 수 있다(S310). 상기 표준 항목명은 사전에 설정될 수 있으며, 또한 표준 항목명과 유사한 하나 이상의 명칭이 사전에 매핑되어 저장될 수 있다. 예컨대, "미세먼지 농도"이라는 표준 항목명과 유사 단어로 "대기중 먼지 농도", "대기 먼지 농도", "부유 먼지 농도" 등이 매핑되어 있을 수 있다.Referring to FIG. 3 , the data processing unit 14 may identify one or more item names from the extracted analysis target data ( S305 ). Next, the data processing unit 14 may determine whether the identified item name is a preset standard item name (S310). The standard item name may be set in advance, and one or more names similar to the standard item name may be mapped in advance and stored. For example, "air dust concentration", "air dust concentration", "floating dust concentration", etc. may be mapped as words similar to the standard item name "fine dust concentration".

상기 식별한 항목명이 표준 항목 명칭이 아니라는 것에 응답하여, 데이터 가공부(14)는 상기 식별한 항목명을 표준 항목명으로 변경할 수 있다(S315). 예를 들어, 식별한 항목명이 비표준 항목명에 해당하는 "대기중 먼지 농도"인 경우, "대기중 먼지 농도"를 표준 항목명인 "미세먼지 농도"로 변경할 수 있다.In response to the identified item name being not a standard item name, the data processing unit 14 may change the identified item name to a standard item name (S315). For example, when the identified item name is "airborne dust concentration" corresponding to a non-standard item name, "airborne dust concentration" may be changed to "fine dust concentration", which is a standard item name.

다음으로, 데이터 가공부(14)는 분석 대상 데이터의 메타데이터로부터 기관명을 식별할 수 있다(S320). 데이터 가공부(14)는 식별한 기관명이 사전에 설정된 표준 기관명인지 여부를 판정할 수 있다(S325). 상기 표준 기관명은 사전에 설정될 수 있으며, 또한 표준 기관명과 유사한 하나 이상의 기관명이 사전에 매핑되어 저장될 수 있다. 예컨대, "부산대학교"이라는 표준 기관명과 유사 기관명으로서 "부산대" 등이 매핑되어 기록될 수 있다.Next, the data processing unit 14 may identify the organization name from the metadata of the analysis target data ( S320 ). The data processing unit 14 may determine whether the identified organization name is a preset standard organization name (S325). The standard organization name may be set in advance, and one or more organization names similar to the standard organization name may be mapped in advance and stored. For example, a standard institution name of "Pusan National University" and "Pusan National University" may be mapped and recorded as a similar institution name.

상기 식별한 기관명이 표준 기관명이 아니라는 것에 응답으로, 데이터 가공부(14)는 상기 식별한 기관명과 대응하는 표준 기관명으로 상기 기관명을 변경할 수 있다(S330). 예를 들어, 식별한 기관명이 비표준 기관명인 "부산대"인 경우, "부산대"을 표준 항목명인 "부산대학교"로 변경할 수 있다.In response to the identified organization name being not the standard organization name, the data processing unit 14 may change the organization name to a standard organization name corresponding to the identified organization name (S330). For example, if the identified institution name is "Pusan National University", which is a non-standard institution name, "Pusan National University" may be changed to "Pusan National University", which is a standard item name.

다음으로, 데이터 가공부(14)는 분석 대상 데이터로부터 공간값을 식별할 수 있다(S335). 데이터 가공부(14)는 식별한 공간값의 포맷이 사전에 설정된 공간값의 표준 포맷인지 여부를 판정할 수 있다(S340). 즉, 데이터 가공부(14)는 식별한 공간값의 기재 형식이, 사전에 설정된 공간값의 표준 형식에 해당하는지 여부를 판정할 수 있다. 예를 들어, 공간값의 표준 포맷은, 도로명을 포함하는 행정 주소 형식일 수 있다. Next, the data processing unit 14 may identify a spatial value from the analysis target data (S335). The data processing unit 14 may determine whether the format of the identified spatial value is a standard format of a preset spatial value (S340). That is, the data processing unit 14 can determine whether or not the identified spatial value description format corresponds to a preset standard format for spatial values. For example, the standard format of the spatial value may be an administrative address format including a street name.

상기 식별한 공간값의 포맷이 공간값의 표준 포맷이 아니라는 것에 응답하여, 데이터 가공부(14)는 상기 식별한 공간값을 표준 형식으로 변경할 수 있다(S345). 예를 들어, 식별한 공간값의 포맷이 위경도 좌표인 경우, 데이터 가공부(14)는 상기 위경도 좌표를 도로명을 포함하는 행정 주소 형식으로 변경할 수 있다. In response to the format of the identified spatial value being not the standard format of the spatial value, the data processing unit 14 may change the identified spatial value to the standard format (S345). For example, when the format of the identified spatial value is latitude and longitude coordinates, the data processing unit 14 may change the latitude and longitude coordinates into an administrative address format including a road name.

다음으로, 데이터 가공부(14)는 분석 대상 데이터로부터 시간값을 식별할 수 있다(S350). 데이터 가공부(14)는 식별한 시간값의 포맷이 사전에 설정된 시간값의 표준 포맷에 해당하는지 여부를 판정할 수 있다(S355). 즉, 데이터 가공부(14)는 식별한 시간값의 기재 형식이, 사전에 설정된 시간값의 표준 형식에 해당하는지 여부를 판정할 수 있다. 예를 들어, 시간값의 표준 포맷은, 년도, 월 및 날짜 순서를 가지며, 네 자리 수의 년도, 두 자리 수의 월 및 두 자리 수의 날짜를 포함할 수 있다.Next, the data processing unit 14 may identify a time value from the analysis target data (S350). The data processing unit 14 may determine whether the format of the identified time value corresponds to a standard format of a preset time value (S355). That is, the data processing unit 14 may determine whether the written format of the identified time value corresponds to a standard format of a preset time value. For example, the standard format of the time value has the order of year, month, and date, and may include a four-digit year, two-digit month, and two-digit date.

상기 식별한 시간값의 포맷이 표준 포맷이 아니라는 것에 응답하여, 데이터 가공부(14)는 상기 식별한 시간값을 시간값의 표준 포맷으로 변경할 수 있다(S360). 예를 들어, 월일년 순의 시간값을 년월일 순서의 시간값으로 변경할 수 있다. In response to that the format of the identified time value is not a standard format, the data processing unit 14 may change the identified time value into a standard format of the time value (S360). For example, a time value in the order of month, day, year, and year may be changed to a time value in the order of year, month, day.

다음으로, 데이터 가공부(14)는 표준 포맷, 표준 기관명 및 표준 항목을 가지는 분석 대상 데이터를 포함하는 가공 데이터셋을 가공 데이터셋 저장부(13)에 저장할 수 있다(S365). Next, the data processing unit 14 may store the processing data set including the analysis target data having a standard format, a standard organization name, and a standard item in the processing data set storage unit 13 ( S365 ).

본 실시예에 따르면, 데이터셋에 포함된 데이터 중에서, 분석 대상이 되는 데이터가 선별되고, 더불어 선별된 데이터들 중에서 분석에 용이한 형태로 변경됨으로써, 후술하는 데이터셋들을 관계 분석이 더욱 빠르고 정확하게 이루어질 수 있다.According to this embodiment, data to be analyzed are selected from among the data included in the dataset, and the data to be analyzed are changed to an easy-to-analyze form among the selected data, so that the relational analysis of the datasets to be described later can be performed more quickly and accurately. can

이하, 도 4를 참조하여, 도 2의 단계 S400에 대해서 자세하게 설명한다.Hereinafter, with reference to FIG. 4 , step S400 of FIG. 2 will be described in detail.

도 4를 참조하면, 연관 관계 분석부(15)는 가공 데이터셋 저장부(13)에서 저장된 복수의 데이터셋을 추출하고, 각각 데이터셋들에 포함된 공간값을 비교 분석하여, 각각 데이터셋들의 공간 연관 관계를 분석하고 그 공간 연관 관계의 강도를 가공 데이터셋 저장부(13)에 저장할 수 있다(S405). 즉, 연관 관계 분석부(15)는 제1 데이터셋에 포함된 제1 공간값과 제2 데이터셋을 포함된 제2 공간값을 이용하여, 제1 데이터셋과 제2 데이터셋 간의 공간 연관 관계의 강도를 연산하여 저장할 수 있다. 일 실시예에서, 연관 관계 분석부(15)는 제1 공간값과 제2 공간값 간의 거리를 측정하고, 측정된 거리가 속하는 범위에 따라 상기 공간 연관 관계의 강도를 연산할 수 있다. 예컨대, 공간값 간의 거리가 제1 거리 범위에 포함되는 경우, 공간 연관 관계의 강도는 "상"일 수 있고, 공간값 간의 거리가 제2 거리 범위에 포함되는 경우 공간 연관 관계의 강도는 "중"일 수 있고, 공간값 간의 거리가 제3 거리 범위에 포함되는 경우 공간 연관 관계의 강도는 "약"일 수 있다. 여기서 "상"에 가까울수록 공간값 간의 거리가 가까우며, 공간 연관 관계가 강함을 의미할 수 있다. Referring to FIG. 4 , the correlation analysis unit 15 extracts a plurality of data sets stored in the processed data set storage unit 13 , and compares and analyzes spatial values included in each of the data sets, The spatial correlation may be analyzed and the strength of the spatial correlation may be stored in the processing data set storage unit 13 (S405). That is, the correlation analysis unit 15 uses the first spatial value included in the first dataset and the second spatial value included in the second dataset, and the spatial correlation between the first dataset and the second dataset. can be calculated and stored. In an embodiment, the correlation analysis unit 15 may measure the distance between the first spatial value and the second spatial value, and calculate the strength of the spatial correlation according to a range to which the measured distance belongs. For example, when the distance between spatial values is included in the first distance range, the strength of the spatial correlation may be “upper”, and when the distance between spatial values is included in the second distance range, the strength of the spatial correlation is “medium”. ", and when the distance between spatial values is included in the third distance range, the strength of the spatial association may be "weak". Here, the closer to "upper", the closer the distance between spatial values is, and it may mean that the spatial correlation is strong.

다음으로, 연관 관계 분석부(15)는 각각 데이터셋들에 포함된 시간값을 이용하여, 각각 데이터셋들의 시간 연관 관계를 분석하고 그 시간 연관 관계의 강도를 연산하여 가공 데이터셋 저장부(13)에 저장할 수 있다(S410). 즉, 연관 관계 분석부(15)는 제1 데이터셋에 포함된 제1 시간값과 제2 데이터셋을 포함된 제2 시간값을 비교 분석하여, 제1 데이터셋과 제2 데이터셋 간의 시간 연관 관계의 강도를 연산하여 저장할 수 있다. 일 실시예에서, 연관 관계 분석부(15)는 제1 시간값과 제2 시간값 간의 시간 차이를 계산하고, 계산된 시간 차이가 속하는 범위에 따라 상기 시간 연관 관계의 강도를 연산할 수 있다. 예컨대, 시간값 간의 차이가 제1 시간 범위에 포함되는 경우, 시간 연관 관계의 강도는 "상"일 수 있고, 시간값 간의 차이가 제2 시간 범위에 포함되는 경우 시간 연관 관계의 강도는 "중"일 수 있고, 시간값 간의 차이가 제3 시간 범위에 포함되는 경우 시간 연관 관계의 강도는 "약"일 수 있다. 여기서 "상"에 가까울수록 시간값 간의 차이가 짧고, 시간 연관 관계가 강함을 의미할 수 있다. Next, the correlation analysis unit 15 analyzes the temporal correlation of each dataset by using the time value included in each of the datasets, calculates the strength of the temporal correlation, and operates the processing data set storage unit 13 ) can be stored in (S410). That is, the correlation analysis unit 15 compares and analyzes the first time value included in the first dataset and the second time value included in the second dataset, and correlates the time between the first dataset and the second dataset. The strength of the relationship can be calculated and stored. In an embodiment, the correlation analysis unit 15 may calculate a time difference between the first time value and the second time value, and calculate the strength of the temporal correlation relationship according to a range to which the calculated time difference belongs. For example, when the difference between time values is included in the first time range, the strength of the temporal correlation may be “high”, and when the difference between the time values is included in the second time range, the strength of the temporal correlation is “medium” " may be, and when the difference between time values is included in the third time range, the strength of the temporal correlation may be "about". Here, the closer to “phase”, the shorter the difference between the time values and the stronger the temporal correlation.

다음으로, 연관 관계 분석부(15)는 각각 데이터셋들에 포함된 항목들을 일치정도를 분석하여, 각각 데이터셋들의 항목 연관 관계의 강도를 연산하고, 연산된 항목 연관 관계의 강도를 가공 데이터셋 저장부(13)에 저장할 수 있다(S415). 일 실시예에서, 연관 관계 분석부(15)는 제1 데이터셋에 포함된 항목들과 제2 데이터셋을 포함된 항목들을 비교 분석하여, 일치하는 항목들의 비율 또는 개수에 연산하고, 연산된 일치 항목의 개수 또는 비율을 기초로, 제1 데이터셋과 제2 데이터셋 간의 항목 연관 관계의 강도를 연산하여 저장할 수 있다. 예컨대, 일치 항목의 개수 또는 비율이 제1 범위에 포함되는 경우, 항목 연관 관계의 강도는 "상"일 수 있고, 일치 항목의 개수 또는 비율이 제2 범위에 포함되는 경우 항목 연관 관계의 강도는 "중"일 수 있고, 일치 항목의 개수 또는 비율이 제3 범위에 포함되는 경우 항목 연관 관계의 강도는 "약"일 수 있다. 여기서 "상"에 가까울수록 일치되는 항목 개수가 많거나 일치 항목의 비율이 높을 수 있고, 항목 연관 관계가 강함을 의미할 수 있다. Next, the correlation analysis unit 15 analyzes the degree of matching of the items included in each dataset, calculates the strength of the item correlation of each dataset, and calculates the strength of the calculated item correlation in the processed dataset. It can be stored in the storage unit 13 (S415). In one embodiment, the correlation analysis unit 15 compares and analyzes the items included in the first dataset and the items included in the second dataset, calculates the ratio or number of matching items, and calculates the calculated match Based on the number or ratio of items, the strength of the item correlation between the first dataset and the second dataset may be calculated and stored. For example, when the number or ratio of matching items is included in the first range, the strength of the item affinity may be “up”, and when the number or ratio of matching items are included in the second range, the strength of the item affinity is It may be “medium”, and when the number or ratio of matching items is included in the third range, the strength of the item association may be “weak”. Here, the closer to "top", the greater the number of matching items or the higher the ratio of matching items, and it may mean that the item correlation is strong.

이어서, 연관 관계 분석부(15)는 각 데이터셋의 주제를 식별하고, 식별된 데이터셋의 주제를 가공 데이터셋 저장부(13)에 저장할 수 있다(S420). 일 실시예에서, 연관 관계 분석부(15)는 제1 데이터셋에 포함된 제1 키워드와 제1 메타 데이터를 식별하고, 상기 제1 키워드와 제1 메타 데이터가 미리 저장된 각 주제별 키워드와 메타 데이터 중에서 어느 주제에 매칭되는지 여부를 분석함으로써, 상기 데이터셋의 주제를 식별할 수 있다. 각 주제별 키워드와 메타 데이터는, 가시화 모델링 장치(10)에 미리 저장될 수 있다. 연관 관계 분석부(15)는 상기 식별된 제1 키워드와 각 주제별 키워드 간의 유사도를 연산하고, 상기 식별된 제1 메타 데이터와 상기 주제별 메타 데이터를 유사도를 연산한 후, 유사도가 가장 높은 메타 데이터와 키워드에 해당하는 주제를 상기 데이터셋의 주제로서 식별할 수 있다.Next, the correlation analysis unit 15 may identify the subject of each dataset, and store the identified subject of the dataset in the processed data set storage unit 13 ( S420 ). In an embodiment, the correlation analysis unit 15 identifies a first keyword and first metadata included in the first dataset, and includes keywords and metadata for each subject in which the first keyword and the first metadata are pre-stored. The subject of the dataset may be identified by analyzing which subject among them is matched. Keywords and metadata for each subject may be stored in advance in the visualization modeling apparatus 10 . The correlation analysis unit 15 calculates the degree of similarity between the identified first keyword and the keyword for each subject, calculates the similarity between the identified first meta data and the meta data for each subject, and then compares the meta data with the highest similarity with the meta data with the highest degree of similarity. A subject corresponding to the keyword may be identified as the subject of the dataset.

일 실시예에서, 연관 관계 분석부(15)는 제1 주제에 해당하는 제2 키워드와 제2 메타 데이터를 식별하고, 상기 제1 키워드와 제2 키워드 간의 유사도와, 상기 제1 메타 데이터와 제2 메타 데이터의 유사도를 산출하고, 상기 키워드 유사도와 메타 데이터의 유사도를 가중 합산하여, 해당 주제 연관 관계의 강도를 연산할 수 있다. 이러한 방식으로 연산된 주제 연관 강도 중에서 가장 높은 점수를 가지는 주제를 상기 데이터셋의 주제로서 식별할 수 있다. 연관 관계 분석부(15)는 형태소 일치율, 단어 일치율 등을 이용하여 키워드 유사도와 메타 데이터의 유사도를 연산할 수 있다. In an embodiment, the correlation analysis unit 15 identifies a second keyword and second metadata corresponding to the first topic, and determines a degree of similarity between the first keyword and the second keyword, and the first metadata and second metadata. 2 It is possible to calculate the similarity of the metadata and calculate the strength of the relational relation between the corresponding subjects by weighted summing the keyword similarity and the similarity of the metadata. The subject having the highest score among the subject association strengths calculated in this way may be identified as the subject of the dataset. The correlation analysis unit 15 may calculate the similarity between keywords and meta data using a morpheme matching rate, a word matching rate, and the like.

도 4와 같은 프로세스를 통해서, 복수의 데이터셋 간의 공간 연관도, 시간 연관도, 항목 연관도 및 주제 연관도가 분석될 수 있다. Through the process shown in FIG. 4 , spatial relevance, temporal relevance, item relevance, and subject relevance between a plurality of datasets may be analyzed.

이하, 데이터셋들의 연관 관계를 분석하는 단계 S500에 대해서는 도 5 및 도 6을 참조하여 보다 구체적으로 설명하기로 한다.Hereinafter, the step S500 of analyzing the correlation between the datasets will be described in more detail with reference to FIGS. 5 and 6 .

도 5 및 도 6을 참조하면, 데이터셋 가시화부(16)는 가공 데이터셋 저장부(13)에 저장된 데이터셋을 가리키는 객체를 시각화하여 디스플레이할 수 있다(S505). 일 실시예에서, 데이터셋 가시화부(16)는 데이터셋 식별정보(예컨대, 제목) 및 그 식별정보와 대응하며 해당 데이터셋을 가리키는 객체(도 6에서는 눈금선으로 표현됨)가 표출되도록 복수의 데이터셋을 시각화할 수 있다. 도 6에 예시된 바와 같이, 각 데이터셋을 가리키는 객체는, 링(ring) 형태의 도표에서 눈금선으로 시각화될 수 있다. 도 6에서는 눈금선과 대응하여 데이터셋의 식별정보가 링의 바깥 영역에 표출되어 있다. 또한, 링 형태의 도표에는 각 객체가 등간격으로 배치될 수 있다. 5 and 6 , the dataset visualization unit 16 may visualize and display an object pointing to the dataset stored in the processed dataset storage unit 13 ( S505 ). In an embodiment, the dataset visualization unit 16 displays a plurality of data sets identification information (eg, title) and an object corresponding to the identification information and pointing to the dataset (represented by grid lines in FIG. 6 ). Three can be visualized. As illustrated in FIG. 6 , an object indicating each dataset may be visualized as a grid line in a ring-shaped diagram. In FIG. 6 , identification information of the dataset is displayed on the outer region of the ring in correspondence with the grid lines. In addition, each object may be arranged at equal intervals in the ring-shaped diagram.

이어서, 데이터셋 가시화부(16)는 데이터셋들 중에서 공간 연관 관계의 강도가 사전에 설정된 강도 이상인 데이터셋들을 선별하고, 상기 선별된 데이터셋의 객체들을 연결선으로 시각화하되, 연결선의 높이를 공간 연관 관계의 강도에 따라 결정하여 시각화할 수 있다(S510). 예를 들어, 데이터셋 가시화부(16)는 제1 데이터셋을 가리키는 제1 객체와 제2 데이터셋을 가리키는 제2 객체를 시각화하고, 제1 데이터셋과 제2 데이터셋의 공간 연관 관계의 표현으로서 제1 객체와 제2 객체를 연결선으로 연결하되, 공간 연관 관계의 강도에 기초하여 연결선의 높이를 결정하여 시각화할 수 있다. 일 실시예에서, 상기 연결선은 포물선일 수 있으며, 공간 연관 관계의 강도가 비례하여 상기 포물선의 높이가 높아질 수 있다. Next, the dataset visualization unit 16 selects datasets in which the spatial correlation strength is greater than or equal to a preset strength among the datasets, and visualizes the objects of the selected dataset as a connection line, but spatially correlates the height of the connection line It can be determined and visualized according to the strength of the relationship (S510). For example, the dataset visualization unit 16 visualizes the first object pointing to the first dataset and the second object pointing to the second dataset, and expressing the spatial correlation between the first dataset and the second dataset As a connection line, the first object and the second object are connected, and the height of the connection line can be determined and visualized based on the strength of the spatial relation. In an embodiment, the connecting line may be a parabola, and the height of the parabola may increase in proportion to the strength of the spatial correlation.

도 6에 예시된 바와 같이, 공간 연관 관계를 가지는 데이터셋의 객체가 서로 연결선으로 연결될 수 있다. 또한, 연결선의 높이에 따라 공간 연관 관계의 강도가 직관적으로 표현될 수 있다. 도 6에 따르면, 61 지점에 해당하는 데이터셋의 객체와 62 지점에 해당하는 객체는 가장 높은 높이를 가지는 연결선으로 연결되어 있으며, 이에 따라 61 지점에 객체와 62 지점의 객체는 가장 가장 강한 공간 연관 관계를 가질 수 있다.As illustrated in FIG. 6 , objects of a dataset having spatial correlation may be connected to each other by a connecting line. In addition, the strength of the spatial relation may be intuitively expressed according to the height of the connecting line. According to FIG. 6 , the object at point 61 and the object at point 62 are connected by a connecting line having the highest height, and accordingly, the object at point 61 and the object at point 62 have the strongest spatial association. can have a relationship.

이어서, 데이터셋 가시화부(16)는 연결선로 연결된 데이터셋들 간의 시간 연관 관계의 강도를 식별하고, 상기 시간 연관 관계의 강도에 따라 연결선의 색상을 결정하여 시각화할 수 있다(S515). 예를 들어, 제1 데이터셋과 제2 데이터셋 간의 시간 연관 관계의 강도가 "상"인 경우, 제1 객체와 제2 객체를 연결하는 연결선의 색상을 제1 색상으로 결정할 수 있으며, 상기 시간 연관 관계의 강도가 "중"인 경우에 상기 연결선의 색상을 제2 색상으로 결정할 수 있고, 상기 시간 연관 관계의 강도가 "하"인 경우에 상기 연결선의 색상을 제3 색상으로 결정할 수 있다.Next, the dataset visualization unit 16 may identify the strength of the temporal correlation between the datasets connected by the connection line, determine the color of the connection line according to the strength of the temporal correlation, and visualize it ( S515 ). For example, when the strength of the temporal correlation relationship between the first dataset and the second dataset is “upper”, a color of a connecting line connecting the first object and the second object may be determined as the first color, and the time When the intensity of the association relationship is “medium”, the color of the connecting line may be determined as the second color, and when the intensity of the temporal association is “low”, the color of the connecting line may be determined as the third color.

다음으로, 데이터셋 가시화부(16)는 연결선로 연결된 데이터셋들 간의 항목 연관 관계의 강도를 식별하고, 상기 항목 연관 관계의 강도에 따라 연결선의 굵기를 결정하여 시각화할 수 있다(S520). 예를 들어, 제1 데이터셋과 제2 데이터셋 간의 항목 연관 관계의 강도가 "상"인 경우, 제1 객체와 제2 객체를 연결하는 연결선의 굵기를 2배로 할 수 있으며, 상기 항목 연관 관계의 강도가 "중"인 경우에 상기 연결선의 굵기를 그대로 결정할 수 있고, 상기 항목 연관 관계의 강도가 "하"인 경우에 상기 연결선의 굵기를 반으로 결정할 수 있다. 도 6에 예시된 바와 같이, 61 지점에 해당하는 데이터셋의 객체와 62 지점에 해당하는 객체는 항목 연결 관계의 강도가 강함에 따라, 연결선이 상대적으로 굵게 시각화될 수 있다.Next, the dataset visualization unit 16 may identify the strength of the item correlation between datasets connected by the connection line, determine the thickness of the connection line according to the strength of the item correlation, and visualize it ( S520 ). For example, when the strength of the item correlation between the first dataset and the second dataset is “upper”, the thickness of the connecting line connecting the first object and the second object may be doubled, and the item association relationship When the strength of is "medium", the thickness of the connecting line may be determined as it is, and when the intensity of the item correlation is "low", the thickness of the connecting line may be determined in half. As illustrated in FIG. 6 , as the strength of the item connection relationship between the object corresponding to point 61 and the object corresponding to point 62 is strong, the connection line may be visualized to be relatively thick.

다음으로, 데이터셋 가시화부(16)는 각 데이터셋들의 주제를 식별하고, 상기 주제에 대응하는 색상을 식별한 후, 그 식별된 색상으로 객체를 시각화할 수 있다(S525). 도 6에 따르면, 제1 주제에 해당하는 데이터셋들은 "적색"으로 시각화되고, 제2 주제에 해당하는 데이터셋들은 "청색"으로 시각화되었으며, 제3 주제에 해당하는 데이터셋들은 "황색"으로 시각화되어 디스플레이될 수 있다. 또한, 데이터셋 가시화부(16)는 동일한 주제에 해당하는 데이터셋의 객체들을 인접되게 하여 시각화할 수 있다. 도 6에 링 형태의 도표에 동일한 주제에 해당하는 데이터셋의 객체들이 인접되어 모여 있다.Next, the dataset visualization unit 16 may identify the subject of each dataset, identify a color corresponding to the subject, and then visualize the object with the identified color ( S525 ). According to FIG. 6 , the datasets corresponding to the first topic are visualized as “red”, the datasets corresponding to the second topic are visualized as “blue”, and the datasets corresponding to the third topic are visualized as “yellow”. It can be visualized and displayed. Also, the dataset visualization unit 16 may visualize objects of a dataset corresponding to the same subject as being adjacent to each other. In the ring-shaped diagram in FIG. 6 , objects of the dataset corresponding to the same subject are adjacent to each other.

본 실시예에 따르면, 다양한 연관 관계가 하나의 도표에 다양한 시각적인 요소로 디스플레이됨으로써, 사용자는 데이터셋의 연관 관계를 직관적으로 파악할 수 있다. According to the present embodiment, since various correlations are displayed as various visual elements on one chart, the user can intuitively understand the correlations of the dataset.

본 발명의 또 다른 실시예에서, 데이터 가시화부(16)는 특정 연결선이 사용자에 의해서 선택되는 경우, 상기 연결선으로 연결되어진 데이터셋들 간의 연관 관계 상세 정보를 디스플레이할 수 있다.In another embodiment of the present invention, when a specific connection line is selected by the user, the data visualization unit 16 may display detailed information on the relationship between datasets connected by the connection line.

도 7은 도 6의 연결선(63)이 선택된 경우에, 디스플레이되는 연관 관계의 상세 정보를 예시하는 도면이다.FIG. 7 is a diagram illustrating detailed information of a relationship displayed when the connection line 63 of FIG. 6 is selected.

도 6에서 참조부호 63의 연결선이 사용자의 선택된 경우, 데이터 가시화부(16)는 상기 연결선(63)이 연결하고 있는 복수의 객체(61, 62)를 식별하고, 상기 객체들과 대응되는 데이터셋을 식별할 수 있다. 도 6에 따르면, 61 참조부호를 가지는 객체는 "하천별 통계 서비스"의 제목을 가지는 데이터셋을 가리킬 수 있고, 62 참조부호를 가지는 객체는 "한국환경공단 오전황사 발생정보"를 가리킬 수 있다. 데이터 가시화부(16)는 식별된 두 데이터셋에 대한 연관 관계의 강도 및 해당 연관 관계의 강도를 연산하기 위해서 기초가 되는 데이터(예컨대, 공간값, 시간값, 일치 항목 등)를 포함하는 연관 관계의 상세 정보를 추출하여 도 7과 같은 형태로 디스플레이할 수 있다.In FIG. 6 , when the connecting line 63 is selected by the user, the data visualization unit 16 identifies a plurality of objects 61 and 62 to which the connecting line 63 is connected, and a dataset corresponding to the objects. can be identified. According to FIG. 6 , an object with reference number 61 may point to a dataset having the title of "Statistics service by river", and an object with reference number 62 may indicate "information on the occurrence of morning yellow dust by Korea Environment Corporation". The data visualization unit 16 is an association relationship including data (eg, a spatial value, a temporal value, a match item, etc.) that is a basis for calculating the strength of the association for the two identified datasets and the strength of the association. 7 may be extracted and displayed in the form of FIG. 7 .

도 7에 예시된 바와 같이, 연결선(63)이 선택된 경우, 연결선(63)을 통해서 연결된 "하천별 통계 서비스"의 데이터셋과 "한국환경공단 오전황사 발생정보"의 데이터셋 간의 공간 연관 관계의 강도, 시간 연관 관계의 강도. 항목 연관 관계이 강도, 해당 데이터셋의 공간값, 시간값, 데이터셋들 간에 일치되는 항목 및 각 데이터셋의 주제를 디스플레이하여, 사용자에 의해 선택된 데이터셋들 간의 연관 관계 상세정보를 사용자에게 제공할 수 있다. As illustrated in FIG. 7 , when the connecting line 63 is selected, the spatial correlation relationship between the data set of “statistical service by river” and the data set of “Korea Environment Corporation morning yellow dust occurrence information” connected through the connecting line 63 strength, the strength of the temporal association. It is possible to provide the user with detailed information on the relationship between the datasets selected by the user by displaying the item correlation strength, the spatial value of the corresponding dataset, the temporal value, the matching items between datasets, and the subject of each dataset. there is.

일 실시예에서, 데이터 가시화부(16)는 상기 연관 관계 상세정보를 상기 도표의 상위 레이어 형태로 디스플레이할 수 있으며, 또는 도표를 화면에서 제거한 후에 상기 연관 관계 상세정보를 디스플레이할 수도 있다. In an embodiment, the data visualization unit 16 may display the detailed correlation information in the form of an upper layer of the diagram, or may display the detailed correlation information after removing the diagram from the screen.

상술한 바와 같이, 사용자는 도표에서 연결선을 선택함으로써, 연결선을 통해 연결된 데이터셋들 간의 연관 관계의 상세정보를 확인할 수 있다. As described above, by selecting the connection line in the diagram, the user can check detailed information of the correlation between data sets connected through the connection line.

한편, 상술한 실시예들에서, 공간 연관 관계가 연결선으로 시각화되고, 시간 연관 관계가 연결선의 색상으로 시각화되며, 항목 연관 관계가 연결선의 굵기로 시각화되고, 데이터셋의 주제가 객체의 색상으로 시각화된 것으로 설명하였다. 그러나 본 발명은 이에 한정되지 않고, 다양한 연관 관계가 상술한 시각화 표현 이외에 또 다른 방식의 그래픽 표현을 이용하여 시각화될 수 있음을 유의해야 한다. 예컨대, 시간 연관 관계는 연결선을 이용한 객체 간의 연결, 객체의 색상, 연결선의 굵기 또는 또 다른 그래픽 표현으로 시각화될 수 있다. 또 다른 예로서, 공간 연관 관계는 객체의 색상, 연결선의 굵기, 연결선의 색상 또는 또 다른 그래픽 표현으로 시각화될 수 있다.Meanwhile, in the above-described embodiments, spatial correlation is visualized as a connecting line, temporal relation is visualized as a color of a connecting line, item correlation is visualized as a thickness of a connecting line, and the subject of a dataset is visualized as an object color was explained as However, it should be noted that the present invention is not limited thereto, and various relational relationships may be visualized using a graphic representation other than the above-described visualization representation. For example, the temporal relationship may be visualized as a connection between objects using a connection line, a color of an object, a thickness of a connection line, or another graphic representation. As another example, the spatial association may be visualized as a color of an object, a thickness of a connecting line, a color of a connecting line, or another graphic representation.

도 8은 다양한 실시예에서 컴퓨팅 장치를 구현할 수 있는 예시적인 하드웨어 구성도이다.8 is an exemplary hardware configuration diagram that may implement a computing device in various embodiments.

본 실시예에 따른 컴퓨팅 장치(1000)는 하나 이상의 프로세서(1100), 시스템 버스(1600), 통신 인터페이스(1200), 프로세서(1100)에 의하여 수행되는 컴퓨터 프로그램(1500)을 로드(load)하는 메모리(1400)와, 컴퓨터 프로그램(1500)을 저장하는 스토리지(1300)를 포함할 수 있다. 도 8에서는 실시예와 관련 있는 구성요소들 만이 도시되어 있다. 따라서, 본 명세서의 실시예들이 속한 기술분야의 통상의 기술자라면 도 8에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.The computing device 1000 according to the present embodiment includes one or more processors 1100 , a system bus 1600 , a communication interface 1200 , and a memory for loading a computer program 1500 executed by the processor 1100 . 1400 and a storage 1300 for storing the computer program 1500 may be included. In Fig. 8, only the components related to the embodiment are shown. Accordingly, those skilled in the art to which the embodiments of the present specification pertain can know that other general-purpose components other than those shown in FIG. 8 may be further included.

프로세서(1100)는 컴퓨팅 장치(1000)의 각 구성의 전반적인 동작을 제어한다. 프로세서(1100)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 명세서의 기술 분야에 잘 알려진 임의의 형태의 프로세서 중 적어도 하나를 포함하여 구성될 수 있다. 또한, 프로세서(1100)는 다양한 실시예들에 따른 방법/동작을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(1000)는 둘 이상의 프로세서를 구비할 수 있다.The processor 1100 controls the overall operation of each component of the computing device 1000 . The processor 1100 includes at least one of a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphic processing unit (GPU), or any type of processor well known in the art. may be included. In addition, the processor 1100 may perform an operation on at least one application or program for executing the method/operation according to various embodiments. The computing device 1000 may include two or more processors.

메모리(1400)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(1400)는 본 명세서의 다양한 실시예들에 따른 방법/동작들을 실행하기 위하여 스토리지(1300)로부터 하나 이상의 프로그램(1500)을 로드(load) 할 수 있다. 메모리(1400)의 예시는 RAM이 될 수 있으나, 이에 한정되는 것은 아니다. The memory 1400 stores various data, commands, and/or information. The memory 1400 may load one or more programs 1500 from the storage 1300 to execute methods/operations according to various embodiments of the present specification. An example of the memory 1400 may be a RAM, but is not limited thereto.

통신 인터페이스(1200)는 이동통신망, 유선 인터넷망 등의 네트워크를 이용하여 이동통신단말, 개인용 컴퓨터, 서버 등과 같은 외부의 통신 장치와 통신할 수 있다. 상기 통신 인터페이스(1200)는 통신 장치로부터 입력 정보를 수신할 수 있다. The communication interface 1200 may communicate with an external communication device such as a mobile communication terminal, a personal computer, or a server using a network such as a mobile communication network or a wired Internet network. The communication interface 1200 may receive input information from a communication device.

시스템 버스(1600)는 컴퓨팅 장치(1000)의 구성 요소 간 통신 기능을 제공한다. 상기 시스템 버스(1600)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다. The system bus 1600 provides a communication function between components of the computing device 1000 . The system bus 1600 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

스토리지(1300)는 하나 이상의 컴퓨터 프로그램(1500)을 비임시적으로 저장할 수 있다. 스토리지(1300)는 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 명세서의 실시예들이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다. 또한, 상기 스토리지(1300)는 상술한 사용자의 과거 거래 이력을 저장할 수 있고, 또한 상품의 과거 거래 이력을 저장할 수도 있다.The storage 1300 may non-temporarily store one or more computer programs 1500 . The storage 1300 may include a non-volatile memory such as a flash memory, a hard disk, a removable disk, or any type of computer-readable recording medium well known in the art to which embodiments of the present specification pertain. . In addition, the storage 1300 may store the user's past transaction history, and may also store the past transaction history of the product.

컴퓨터 프로그램(1500)은 본 명세서의 다양한 실시예들에 따른 방법/동작들이 구현된 하나 이상의 인스트럭션(instruction)들을 포함할 수 있다. 컴퓨터 프로그램(1500)이 메모리(1400)에 로드 되면, 프로세서(1100)는 상기 하나 이상의 인스트럭션들을 실행시킴으로써 본 명세서의 다양한 실시예들에 따른 방법/동작들을 수행할 수 있다. 컴퓨터 프로그램(1500)은, 도 2 내지 도 7을 참조하여 설명한 방법을 위한 인스트럭션들을 포함할 수 있다. The computer program 1500 may include one or more instructions in which methods/operations according to various embodiments of the present specification are implemented. When the computer program 1500 is loaded into the memory 1400 , the processor 1100 may execute the one or more instructions to perform methods/operations according to various embodiments of the present specification. The computer program 1500 may include instructions for the method described with reference to FIGS. 2 to 7 .

일 실시예에서, 컴퓨터 프로그램(1500)는 복수의 데이터셋을 분석하여, 제1 연관 관계를 가지는 제1 데이터셋과 제2 데이터셋을 선별하는 동작과, 상기 제1 데이터셋을 가리키는 제1 객체와 상기 제2 데이터셋을 가리키는 제2 객체를 시각화하고, 상기 제1 연관 관계의 표현으로서 상기 제1 객체와 상기 제2 객체를 연결선으로 연결하는 동작과, 상기 제1 연관 관계의 강도에 기초하여 상기 연결선의 높이를 결정하여 시각화하는 동작과, 상기 제1 데이터셋과 상기 제2 데이터셋 간의 제2 연관 관계를 분석하는 동작과, 상기 제2 연관 관계의 강도에 기초하여 상기 연결선의 색상 또는 굵기를 결정하여 시각화하는 동작을 수행하기 위한 인스트럭션들(instructions)을 포함할 수 있다. In an embodiment, the computer program 1500 analyzes a plurality of data sets and selects a first data set and a second data set having a first correlation, and a first object pointing to the first data set. and a second object pointing to the second dataset, and an operation of connecting the first object and the second object with a connecting line as an expression of the first relation, and based on the strength of the first relation An operation of determining and visualizing the height of the connection line, an operation of analyzing a second correlation relationship between the first dataset and the second dataset, and a color or thickness of the connection line based on the strength of the second correlation relationship may include instructions for performing an operation of determining and visualizing .

지금까지 도 1 내지 도 8을 참조하여 본 발명의 다양한 실시예들 및 그 실시예들에 따른 효과들을 언급하였다. 본 발명의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.So far, various embodiments of the present invention and effects according to the embodiments have been described with reference to FIGS. 1 to 8 . Effects according to the technical spirit of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

지금까지 도 1 내지 도 8을 참조하여 설명된 본 발명의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical idea of the present invention described with reference to FIGS. 1 to 8 may be implemented as computer-readable codes on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). can The computer program recorded on the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though all the components constituting the embodiment of the present invention are described as being combined or operating in combination, the technical spirit of the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although acts are shown in a particular order in the drawings, it should not be understood that the acts must be performed in the specific order or sequential order shown, or that all illustrated acts must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 발명이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can practice the present invention in other specific forms without changing the technical spirit or essential features. can understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the technical ideas defined by the present invention.

Claims

A method performed by a computer device, comprising:
analyzing a plurality of data sets to identify a first data set and a second data set having a first correlation relationship;
Visualizing a first object pointing to the first dataset and a second object pointing to the second dataset, and connecting the first object and the second object with a connecting line as an expression of the first association relationship;
determining and visualizing the height of the connecting line based on the strength of the first association relationship;
analyzing a second association relationship between the first dataset and the second dataset;
Visualizing by determining a color or thickness of the connecting line based on the strength of the second association relationship;
receiving a selection of the visualized connection line from a user; and
Displaying detailed association information including data of the first dataset and data of the second dataset as a basis for calculating the strength of the first association relationship and the strength of the second association relationship; ,
The connecting line is a parabola,
The height of the parabola increases in proportion to the strength of the first association relationship,
How to visualize dataset associations.

delete

According to claim 1,
The first association relationship is a space-based association relationship,
A distance between a spatial value included in the first dataset and a spatial value included in the second dataset is calculated, and a first correlation relationship between the first dataset and the second dataset is calculated using the calculated distance. Further comprising the step of calculating the strength of
How to visualize dataset associations.

According to claim 1,
The thickness of the connecting line is that the thickness is proportional to the strength of the second association relationship,
How to visualize dataset associations.

According to claim 1,
The second association is a time-based association,
The step of analyzing the second association relationship is
Calculate the difference between the time value included in the first dataset and the time value included in the second dataset, and use the difference to determine the strength of the second correlation between the first dataset and the second dataset comprising the step of calculating
How to visualize dataset associations.

According to claim 1,
The second association is an item-based association,
The step of analyzing the second association relationship is
The degree of matching between the items included in the first dataset and the items included in the second dataset is analyzed, and a second correlation relationship between the first dataset and the second dataset is analyzed using the analyzed matching information. Comprising the step of calculating the strength of
How to visualize dataset associations.

According to claim 1,
grouping the datasets having the same subject by analyzing the plurality of datasets; and
Further comprising the step of visualizing the objects pointing to the grouped datasets in the same graphical representation,
How to visualize dataset associations.

8. The method of claim 7,
The step of visualizing with the same graphic representation,
Extracting keywords and metadata from a dataset, and identifying a subject of the dataset using the keywords and metadata.
How to visualize dataset associations.

8. The method of claim 7,
The step of visualizing with the same graphic representation,
Visualizing a plurality of objects pointing to the grouped datasets to identify a subject of the grouped datasets and to have a color corresponding to the subject,
How to visualize dataset associations.

According to claim 1,
The step of connecting with the connecting line is,
Comprising the step of visualizing the first object and the second object on a diagram in the form of a ring,
In the ring-shaped diagram, each object is arranged at equal intervals,
How to visualize dataset associations.

According to claim 1,
collecting a dataset;
determining whether an item included in the collected data set is a standard item name; and
In response to determining that the item is not a standard item name, further comprising the step of changing an item included in the collected data set to the standard item,
How to visualize dataset associations.

According to claim 1,
collecting a dataset;
determining whether a time value included in the collected data set is in a standard format; and
In response to determining that the time value included in the collected dataset is not in a standard format, changing the time value included in the collected dataset to the standard format,
How to visualize dataset associations.

According to claim 1,
collecting a dataset;
determining whether a spatial value included in the collected data set is in a standard format; and
In response to determining that the spatial value included in the collected dataset is not in a standard format, changing the spatial value included in the collected dataset to the standard format,
How to visualize dataset associations.

delete

a storage unit for a plurality of data sets;
an association analysis unit extracting a first data set and a second data set from the storage unit, and analyzing a first association relation and a second association relation between the extracted first data set and the second data set; and
Visualize a first object pointing to the first dataset and a second object pointing to the second dataset, wherein the first object and the second object are parabolically connected as an expression of the first relation, and the second object is 1 The height of the parabola is determined so that the height of the parabola rises in proportion to the strength of the relation, and a data set visualization unit for visualizing the color or thickness of the parabola is determined based on the strength of the second relation. ,
When the visualized parabola is selected by the user, the association including the data of the first dataset and the data of the second dataset as a basis for calculating the strength of the first association and the strength of the second association displaying relationship details;
Visualization modeling device.

17. The method of claim 16,
The correlation analysis unit determines whether the first dataset and the second dataset are the same subject,
The dataset visualization unit visualizes the first object and the second object with the same color if the determination result is the same subject,
Visualization modeling device.

one or more processors;
a memory for loading a program executed by the processor; and
Including a storage in which the program is stored,
The program is
analyzing a plurality of data sets and selecting a first data set and a second data set having a first correlation relationship;
Visualizing a first object pointing to the first dataset and a second object pointing to the second dataset, and parabolically connecting the first object and the second object as an expression of the first relation;
determining and visualizing the height of the parabola so that the height of the parabola increases in proportion to the strength of the first association relationship;
analyzing a second association relationship between the first dataset and the second dataset; and
determining and visualizing the color or thickness of the parabola based on the strength of the second association relationship;
receiving a selection from the user for the visualized parabola; and
performing an operation of displaying detailed association information including data of the first data set and data of the second data set as a basis for calculating the strength of the first association relationship and the strength of the second association relationship; including instructions for
computing device.

A computer-readable non-transitory storage medium comprising instructions, comprising:
The instructions, when executed by a processor, cause the processor to:
analyzing a plurality of data sets and selecting a first data set and a second data set having a first correlation relationship;
Visualizing a first object pointing to the first dataset and a second object pointing to the second dataset, and parabolically connecting the first object and the second object as an expression of the first association relationship;
Determining and visualizing the height of the parabola so that the height of the parabola increases in proportion to the strength of the first association relationship;
analyzing a second association relationship between the first dataset and the second dataset;
Visualizing by determining a color or thickness of the parabola based on the strength of the second association relationship;
receiving a selection of the visualized parabola from a user; and
Displaying detailed association information including data of the first dataset and data of the second dataset as a basis for calculating the strength of the first association relationship and the strength of the second association relationship; to perform actions,
A computer-readable, non-transitory storage medium.