KR102614051B1

KR102614051B1 - Fraud detection method using graph database, and a computer program recorded on a recording medium for executing the same

Info

Publication number: KR102614051B1
Application number: KR1020230092388A
Authority: KR
Inventors: 탁정수; 박정현
Original assignee: (주)인포시즈
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-12-14

Abstract

본 발명은 그래프 데이터베이스를 이용하여 이상 징후를 탐지하기 위한 방법을 제안한다. 상기 방법은 관계형 데이터베이스(relational database)에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스(graph database)로 변환하는 단계, 상기 변환된 그래프 데이터베이스를 구성하는 노드(node), 엣지(edge) 및 속성(properties)의 전부 또는 일부에 대응하여 그래프(graph)를 생성하는 단계, 및 이상 징후 탐지(fraud detection)를 위해 학습된 인공지능(Artificial Intelligence, AI) 모델에 상기 그래프를 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단하는 단계를 포함할 수 있다.The present invention proposes a method for detecting abnormalities using a graph database. The method includes converting all or part of the data included in a relational database into a graph database, and converting the nodes, edges, and properties that make up the converted graph database. ), generating a graph in response to all or part of ( AI) may include a step of determining the presence or absence of an abnormality based on the result value output from the model.

Description

An abnormality detection method using a graph database and a computer program recorded on a recording medium to execute the same {Fraud detection method using graph database, and a computer program recorded on a recording medium for executing the same}

본 발명은 빅데이터(big data)에 관한 것이다. 보다 상세하게는, 그래프 데이터베이스(graph database)를 이용하여 이상 징후를 탐지(fraud detection)하기 위한 방법과 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램에 관한 것이다.The present invention relates to big data. More specifically, it relates to a method for detecting fraud using a graph database and a computer program recorded on a recording medium to execute the method.

그래프(Graph)는 공집합이 아닌 노드(node)의 유한집합과, 노드의 쌍으로 구성된 엣지(edge)의 유한집합으로 구성되는 데이터 구조(data structure)이다. 여기서, 그래프의 노드는 정점(vertex) 또는 포인트(point)라 지칭되기도 하고, 엣지는 간선, 라인(line) 또는 관계(relationship)라 지칭되기도 한다. 그리고, 그래프는 엣지를 구성하는 노드의 쌍에 순서가 부여되지 않는 무방향 그래프(undirected graph)와, 엣지를 구성하는 노드의 쌍에 순서가 부여된 방향 그래프(directed graph)로 분류되기도 한다.A graph is a data structure composed of a finite set of nodes, which are not empty sets, and a finite set of edges composed of pairs of nodes. Here, the nodes of the graph may be referred to as vertices or points, and the edges may be referred to as edges, lines, or relationships. Additionally, graphs are classified into an undirected graph, in which no order is given to the pairs of nodes that make up the edges, and a directed graph, in which an order is given to the pairs of nodes that make up the edges.

종래의 관계형 데이터베이스(relational database)는 구조화된 질의 언어(Structured Query Language, SQL)를 이용하여, 레코드(record)와 컬럼(column)으로 구성된 테이블(table)에 데이터를 저장 및 관리할 수 있는 데이터베이스이다. 관계형 데이터베이스는 테이블 내의 외래 키(foreign key)를 통해 두 엔티티(entity) 사이의 관계를 표현함으로써, 데이터 사이의 논리적 관계성을 반영할 수 있다는 점에서 널리 사용되었다. 그러나, 처리해야하는 데이터의 양이 급격히 증가하면서 관계형 데이터베이스의 전형적 질의(query)로는 데이터베이스 구조의 확장 또는 변경에 많은 연산량이 요구되었다. 또한, 관계형 데이터베이스는 두 엔티티 사이의 관계를 제한적인 방식으로만 표현할 수 있어, 거미줄처럼 얽혀 있는 수많은 엔티티들 사이의 유연한 관계성을 직관적으로 표현하는데 한계가 존재하였다.A conventional relational database is a database that can store and manage data in tables composed of records and columns using Structured Query Language (SQL). . Relational databases have been widely used in that they can reflect logical relationships between data by expressing the relationship between two entities through foreign keys in a table. However, as the amount of data to be processed has rapidly increased, a large amount of computation is required to expand or change the database structure for typical queries of relational databases. Additionally, relational databases can only express the relationship between two entities in a limited way, so there are limitations in intuitively expressing the flexible relationship between numerous entities that are intertwined like a spider web.

이와 같은, 종래의 관계형 데이터베이스의 한계를 극복하기 위한 그래프 데이터베이스(graph database)가 등장하였다. 이러한, 그래프 데이터베이스는 데이터, 데이터 사이의 관계를 그래프의 노드, 엣지로 표현하며, 구조화된 질의 언어를 사용하지 않는(No SQL) 비관계형(non-relational) 데이터베이스에 해당한다.A graph database has emerged to overcome the limitations of conventional relational databases. This graph database expresses data and relationships between data as nodes and edges of a graph, and corresponds to a non-relational database that does not use a structured query language (No SQL).

보다 구체적으로, 그래프 데이터베이스는 활용 목적에 따라 속성 그래프(properties graph)와 자원 기술 프레임워크(Resource Description Framework, RDF) 그래프(RDF graph)로 분류될 수 있다. More specifically, graph databases can be classified into properties graph and Resource Description Framework (RDF) graph depending on the purpose of use.

우선, 속성 그래프는 주제에 대한 정보를 포함하는 노드와, 노드들 사이의 관계를 나타내는 엣지를 포함하여 구성될 수 있다. 그리고, 각각의 노드와 엣지는 속성(properties)이라고 하는 요소(attributes)를 가질 수 있다. 일반적으로, 속성 그래프는 데이터 사이의 관계성을 모델링하고, 모델링된 관계성을 기반으로 데이터 분석 작업에 사용된다.First, the attribute graph may be composed of nodes containing information about the topic and edges indicating relationships between the nodes. And, each node and edge can have attributes called properties. Generally, attribute graphs model relationships between data and are used in data analysis tasks based on the modeled relationships.

그리고, 자원 기술 프레임워크(RDF) 그래프는 명령문을 표현하도록 설계된 W3C(Worldwide Web Consortium)의 표준을 따른다. 자원 기술 프레임워크(RDF) 그래프의 명령문은 2개의 노드(RDF 트리플)과 이를 연결하는 엣지를 포함하여 구성될 수 있다. 그리고 각각의 노드와 엣지는 고유한 자원 식별자(Unique Resource Indicator, URI)에 의해 식별될 수 있다. 일반적으로, 자원 기술 프레임워크(RDF) 그래프는 복잡한 메타 데이터(meta data)와 마스터 데이터(master data)를 통합하고 표현하는 작업에 사용된다.Additionally, the Resource Description Framework (RDF) graph follows the standards of the Worldwide Web Consortium (W3C), which are designed to represent statements. A statement in a Resource Description Framework (RDF) graph can consist of two nodes (RDF triples) and an edge connecting them. And each node and edge can be identified by a unique resource indicator (URI). Typically, Resource Description Framework (RDF) graphs are used to integrate and represent complex meta data and master data.

대한민국 공개특허공보 제10-2022-0100791호, ‘그래프 데이터베이스 질의들을 해결하기 위한 방법들, 시스템들, 및 매체들’, (2022.07.18. 공개)Republic of Korea Patent Publication No. 10-2022-0100791, ‘Methods, systems, and media for solving graph database queries’, (published on July 18, 2022)

본 발명의 일 목적은 그래프 데이터베이스를 이용하여 이상 징후를 탐지하기 위한 방법을 제공하는 것이다.One purpose of the present invention is to provide a method for detecting abnormalities using a graph database.

본 발명의 다른 목적은 이상 징후를 탐지하기 위한 방법을 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램을 제공하는 것이다.Another object of the present invention is to provide a computer program recorded on a recording medium for executing a method for detecting abnormalities.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

상술한 바와 같은 기술적 과제를 달성하기 위하여, 본 발명은 그래프 데이터베이스를 이용하여 이상 징후를 탐지하기 위한 방법을 제안한다. In order to achieve the technical problems described above, the present invention proposes a method for detecting abnormalities using a graph database.

상기 방법은 관계형 데이터베이스(relational database)에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스(graph database)로 변환하는 단계, 상기 변환된 그래프 데이터베이스를 구성하는 노드(node), 엣지(edge) 및 속성(properties)의 전부 또는 일부에 대응하여 그래프(graph)를 생성하는 단계, 및 이상 징후 탐지(fraud detection)를 위해 학습된 인공지능(Artificial Intelligence, AI) 모델에 상기 그래프를 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단하는 단계를 포함할 수 있다.The method includes converting all or part of the data included in a relational database into a graph database, and converting the nodes, edges, and properties that make up the converted graph database. ), generating a graph in response to all or part of ( AI) may include a step of determining the presence or absence of an abnormality based on the result value output from the model.

보다 구체적으로, 상기 이상 징후의 유무를 판단하는 단계는 합성곱신경망(Convolutional Neural Network, CNN)으로 구현된 인공지능(AI) 모델에 상기 그래프의 구조가 표현된 이차원 이미지를 입력할 수 있다.More specifically, in the step of determining the presence or absence of the abnormality, a two-dimensional image expressing the structure of the graph may be input to an artificial intelligence (AI) model implemented with a convolutional neural network (CNN).

일 실시예에 따르면, 상기 이상 징후의 유무를 판단하는 단계는 상기 이차원 이미지의 해상도를 변경하여 서로 다른 해상도를 가지는 복수 개의 이차원 이미지들로 구성된 이미지 그룹을 생성한 후, 상기 이미지 그룹에 포함된 모든 이차원 이미지를 상기 합성곱신경망(CNN)으로 구현된 인공지능(AI) 모델에 각각 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 복수 개의 결과 값을 조합하여 상기 이상 징후의 유무를 판단할 수 있다.According to one embodiment, the step of determining the presence or absence of an abnormality includes changing the resolution of the two-dimensional image to create an image group consisting of a plurality of two-dimensional images with different resolutions, and then all of the images included in the image group. After inputting two-dimensional images into an artificial intelligence (AI) model implemented with the convolutional neural network (CNN), the presence or absence of the abnormality is determined by combining a plurality of result values output from the artificial intelligence (AI) model. You can.

다른 실시예에 따르면, 상기 이상 징후의 유무를 판단하는 단계는 상기 그래프를 구성하는 노드, 엣지 및 속성 중 일부만을 가지는 복수 개의 서브 그래프에 대한 이차원 이미지들로 구성된 이미지 그룹을 생성한 후, 상기 이미지 그룹에 포함된 모든 이차원 이미지를 상기 합성곱신경망(CNN)으로 구현된 인공지능(AI) 모델에 각각 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 복수 개의 결과 값을 조합하여 상기 이상 징후의 유무를 판단할 수 있다.According to another embodiment, the step of determining the presence or absence of the anomaly includes generating an image group composed of two-dimensional images for a plurality of subgraphs having only some of the nodes, edges, and properties constituting the graph, and then generating the image group. After inputting all two-dimensional images included in the group into an artificial intelligence (AI) model implemented with the convolutional neural network (CNN), a plurality of result values output from the artificial intelligence (AI) model are combined to indicate the abnormality. The presence or absence of can be determined.

한편, 상기 합성곱신경망(CNN)은 트랜스포머 인코더-디코더(transformer encoder-decoder) 구조를 가질 수 있다. Meanwhile, the convolutional neural network (CNN) may have a transformer encoder-decoder structure.

이 경우, 상기 이상 징후의 유무를 판단하는 단계는 상기 이차원 이미지를 구성하는 데이터를 일렬로 나열하여 하나의 시퀀스(sequence)로 변환하여 인코더에 입력하고, 상기 인코더에서 셀프 어텐션(self-attention)을 적용하여 상기 시퀀스 내의 데이터 위치가 서로 연결된 하나의 벡터(vector)로 변환한 후 변환된 벡터를 디코더에 입력하고, 상기 디코더에서 헝가리안 알고리즘(Hungarian algorithm)에 기반한 손실 함수(loss function)을 사용하여 상기 이상 징후에 대응하는 공간적 패턴을 예측할 수 있다.In this case, the step of determining the presence or absence of the abnormality is to line up the data constituting the two-dimensional image, convert it into a sequence, input it to the encoder, and perform self-attention in the encoder. By applying this, the data positions in the sequence are converted into a single vector connected to each other, and then the converted vector is input to the decoder, and the decoder uses a loss function based on the Hungarian algorithm to Spatial patterns corresponding to the abnormalities can be predicted.

특징적으로, 상기 인코더는 입력 특징 맵(input feature map)에 콘볼루션(convolution) 연산을 수행하여 출력 특징 맵(output feature map)을 생성함에 있어, 특징을 추출하는 영역인 콘볼루션 필터(convolution filter) 크기에 학습 가능한 오프셋(offset)을 반영하여 상기 콘볼루션 연산을 수행함으로써 상기 콘볼루션 필터의 크기 보다 넓은 범위의 그리드(grid) 영역으로부터 특징을 추출할 수 있다.Characteristically, the encoder performs a convolution operation on the input feature map to generate an output feature map, and uses a convolution filter, which is an area for extracting features. By performing the convolution operation by reflecting the learnable offset in the size, features can be extracted from a grid area wider than the size of the convolution filter.

또한, 상기 인코더는 상기 셀프 어텐션을 수행함에 있어, 어텐션(attention)의 키(key)를 상기 오프셋으로 사용할 수도 있다.Additionally, when performing the self-attention, the encoder may use an attention key as the offset.

한편, 상기 그래프를 생성하는 단계는 상기 그래프를 구성하는 노드의 속성에 대응하여 상기 이차원 이미지에 의해 표현될 노드의 크기(size)를 개별적으로 설정하고, 상기 그래프를 구성하는 엣지의 속성에 대응하여 상기 이차원 이미지에 의해 표현될 엣지의 굵기(thickness)를 개별적으로 설정할 수 있다.Meanwhile, in the step of generating the graph, the size of the node to be represented by the two-dimensional image is individually set in response to the properties of the nodes constituting the graph, and the size of the node to be represented by the two-dimensional image is individually set in response to the properties of the edges constituting the graph. The thickness of the edge to be expressed by the two-dimensional image can be individually set.

이와 다르게, 상기 그래프를 생성하는 단계는 상기 그래프를 구성하는 노드의 속성에 대응하여 상기 이차원 이미지에 의해 표현될 노드의 색상(color)을 개별적으로 설정하고, 상기 그래프를 구성하는 엣지의 속성에 대응하여 상기 이차원 이미지에 의해 표현될 엣지의 색상을 개별적으로 설정할 수도 있다.Differently, in the step of generating the graph, the color of the node to be expressed by the two-dimensional image is individually set in response to the properties of the nodes constituting the graph, and corresponding to the properties of the edges constituting the graph. Thus, the color of the edge to be expressed by the two-dimensional image can be individually set.

상술한 바와 같은 기술적 과제를 달성하기 위하여, 본 발명은 이상 징후를 탐지하기 위한 방법을 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램을 제안한다. 상기 컴퓨터 프로그램은 메모리(memory); 및 상기 메모리에 상주된 명령어를 처리하는 프로세서(processor)를 포함하여 구성된 컴퓨팅 장치와 결합될 수 있다. 그리고, 상기 컴퓨터 프로그램은 상기 프로세서가 관계형 데이터베이스에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스로 변환하는 단계, 상기 프로세서가 상기 변환된 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성의 전부 또는 일부에 대응하여 그래프를 생성하는 단계; 및 상기 프로세서가 이상 징후 탐지를 위해 학습된 인공지능(AI) 모델에 상기 그래프를 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단하는 단계를 실행시키기 위하여 기록매체에 기록될 수 있다.In order to achieve the technical problem described above, the present invention proposes a computer program recorded on a recording medium to execute a method for detecting abnormalities. The computer program includes memory; and a processor that processes instructions resident in the memory. And, the computer program includes converting all or part of the data included in the relational database into a graph database, wherein the processor converts all or part of the nodes, edges, and attributes constituting the converted graph database. generating a graph; And after the processor inputs the graph into an artificial intelligence (AI) model learned to detect anomalies, the processor executes a step of determining the presence or absence of anomalies based on the result value output from the artificial intelligence (AI) model. It can be recorded on a recording medium to do so.

기타 실시 예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and drawings.

본 발명의 실시 예들에 따르면, 그래프 형태로 시각화된 데이터를 이용하여 이상 징후를 탐지함으로써, 기존 테이블(table) 형식 또는 매트릭스(matrix) 형식의 데이터베이스 기반으로는 탐지하기 어려웠던 복잡한 관계성을 기반으로 성립되는 이상 징후까지도 탐지할 수 있게 된다. According to embodiments of the present invention, by detecting abnormalities using data visualized in graph form, it is established based on complex relationships that were difficult to detect based on existing table or matrix format databases. Even abnormal signs can be detected.

특히, 콘볼루션 연산을 수행함에 있어 특징(feature)을 추출하는 그리드(grid) 영역을 확장하기 위해 적용되는 오프셋(offset)을 인코더(encoder)의 어텐션(attention)의 키(key)로 활용함으로써, 종래의 합성곱신경망(CNN)에 의해 구현된 인공지능(AI)으로 검출이 어려웠던, 노드와 엣지가 좁은 영역에 밀집된 형태의 이상 징후까지도 검출할 수 있게 된다.In particular, by using the offset applied to expand the grid area from which features are extracted when performing a convolution operation as a key for the attention of the encoder, Artificial intelligence (AI) implemented by the conventional convolutional neural network (CNN) makes it possible to detect even abnormalities in the form of nodes and edges concentrated in a narrow area, which were difficult to detect.

도 1은 본 발명의 일 실시예에 따른 데이터 분석 시스템을 나타낸 구성도이다.
도 2는 본 발명의 일 실시예에 따른 데이터 분석 장치의 논리적 구성도이다.
도 3은 본 발명의 일 실시예에 따라 관계형 데이터베이스로부터 변환된 그래프 데이터베이스를 설명하기 위한 예시도이다.
도 4 및 도 5는 본 발명의 다양한 실시예에 따라 시각화된 그래프들의 형태를 설명하기 위한 예시도이다.
도 6은 본 발명의 일 실시예에 따라 그래프의 변화 과정을 시계열적으로 탐색할 수 있는 사용자 인터페이스(UI)를 설명하기 위한 예시도이다.
도 7 내지 도 9는 본 발명의 다양한 실시예에 따라 이벤트로 식별되는 경우들을 설명하기 위한 예시도이다.
도 10은 본 발명의 일 실시예에 따른 인공지능(AI) 모델의 구조를 설명하기 위한 예시도이다.
도 11은 본 발명의 일 실시예에 따른 인코더의 콘볼루션 연산을 설명하기 위한 예시도이다.
도 12는 본 발명의 일 실시예에 따른 인코더의 셀프 어텐션을 설명하기 위한 예시도이다.
도 13 및 도 14는 본 발명의 다양한 실시예에 따른 인공지능(AI) 모델에 입력될 수 있는 이미지들의 형태를 설명하기 위한 예시도이다.
도 15는 본 발명의 일 실시예에 따른 데이터 분석 장치의 하드웨어 구성도이다.
도 16은 본 발명의 일 실시예에 따른 데이터 분석 방법을 설명하기 위한 순서도이다.1 is a configuration diagram showing a data analysis system according to an embodiment of the present invention.
Figure 2 is a logical configuration diagram of a data analysis device according to an embodiment of the present invention.
Figure 3 is an exemplary diagram illustrating a graph database converted from a relational database according to an embodiment of the present invention.
Figures 4 and 5 are exemplary diagrams for explaining the form of graphs visualized according to various embodiments of the present invention.
Figure 6 is an example diagram illustrating a user interface (UI) that can sequentially explore the change process of a graph according to an embodiment of the present invention.
7 to 9 are exemplary diagrams for explaining cases identified as events according to various embodiments of the present invention.
Figure 10 is an example diagram for explaining the structure of an artificial intelligence (AI) model according to an embodiment of the present invention.
Figure 11 is an example diagram for explaining a convolution operation of an encoder according to an embodiment of the present invention.
Figure 12 is an example diagram for explaining self-attention of an encoder according to an embodiment of the present invention.
Figures 13 and 14 are illustrative diagrams to explain the types of images that can be input to an artificial intelligence (AI) model according to various embodiments of the present invention.
Figure 15 is a hardware configuration diagram of a data analysis device according to an embodiment of the present invention.
Figure 16 is a flowchart for explaining a data analysis method according to an embodiment of the present invention.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that the technical terms used in this specification are only used to describe specific embodiments and are not intended to limit the present invention. In addition, the technical terms used in this specification, unless specifically defined in a different way in this specification, should be interpreted as meanings generally understood by those skilled in the art in the technical field to which the present invention pertains, and are not overly comprehensive. It should not be interpreted in a literal or excessively reduced sense. Additionally, if the technical terms used in this specification are incorrect technical terms that do not accurately express the spirit of the present invention, they should be replaced with technical terms that can be correctly understood by those skilled in the art. In addition, general terms used in the present invention should be interpreted according to the definition in the dictionary or according to the context, and should not be interpreted in an excessively reduced sense.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Additionally, as used herein, singular expressions include plural expressions, unless the context clearly dictates otherwise. In this application, terms such as “consists of” or “have” should not be construed as necessarily including all of the various components or steps described in the specification, and only some of the components or steps are included. It may not be possible, or it should be interpreted as including additional components or steps.

또한, 본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Additionally, terms including ordinal numbers, such as first, second, etc., used in this specification may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component without departing from the scope of the present invention.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected to or connected to the other component, but other components may also exist in between. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다. 본 발명의 사상은 첨부된 도면 외에 모든 변경, 균등물 내지 대체물에 까지도 확장되는 것으로 해석되어야 한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the attached drawings. However, identical or similar components will be assigned the same reference numbers regardless of the reference numerals, and duplicate descriptions thereof will be omitted. Additionally, when describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted. In addition, it should be noted that the attached drawings are only intended to facilitate easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the attached drawings. The spirit of the present invention should be construed as extending to all changes, equivalents, or substitutes other than the attached drawings.

본 발명은 종래의 관계형 데이터베이스를 그래프 데이터베이스로 변환한 후, 그래프 데이터베이스를 활용하여 데이터 관계의 변화 과정을 추적하고, 이상 징후를 탐지하기 위한 수단을 제안하고자 한다.The present invention seeks to propose a means for converting a conventional relational database into a graph database, then using the graph database to track the change process of data relationships and detect abnormalities.

도 1은 본 발명의 일 실시예에 따른 데이터 분석 시스템을 나타낸 구성도이다.1 is a configuration diagram showing a data analysis system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예 따른 데이터 분석 시스템은 다양한 유형의 사용자 단말기(100a, 100b, …, 100n; 100), 서비스 제공 장치(200) 및 데이터 분석 장치(300)를 포함하여 구성될 수 있다.As shown in Figure 1, the data analysis system according to an embodiment of the present invention includes various types of user terminals (100a, 100b, ..., 100n; 100), a service providing device 200, and a data analysis device 300. It can be configured to include.

이와 같은, 데이터 분석 시스템의 구성 요소들은 기능적으로 구분되는 요소들을 나타낸 것에 불과하므로, 둘 이상의 구성 요소가 실제 물리적 환경에서는 서로 통합되어 구현되거나, 하나의 구성 요소가 실제 물리적 환경에서는 서로 분리되어 구현될 수 있을 것이다.Since the components of such a data analysis system merely represent functionally distinct elements, two or more components may be implemented integrated with each other in the actual physical environment, or one component may be implemented separately from each other in the actual physical environment. You will be able to.

각각의 구성 요소에 대하여 설명하면, 사용자 단말기(100)는 서비스 제공 장치(200)의 주체가 제공하는 서비스를 이용하기 위하여, 사용자에 의해 제어될 수 있는 단말기이다. In terms of each component, the user terminal 100 is a terminal that can be controlled by the user in order to use the service provided by the entity of the service providing device 200.

예를 들어, 서비스 제공 장치(200)의 주체(즉, 서비스 제공자)가 금융기관(financial institution)이며 종합 은행 서비스(one-stop banking)를 제공할 경우, 사용자 단말기(100)는 종합 은행 서비스를 이용하기 위한 오픈 뱅킹(open banking), 모바일 뱅킹(mobile banking) 등과 같은 애플리케이션(application)이 동작할 수 있는 스마트폰(smart phone)이 될 수 있다. 그러나, 사용자 단말기의 유형은 스마트폰에 한정되지 아니하며, 데스크탑(desktop), 워크스테이션(workstation) 또는 서버(server)와 같은 고정식 컴퓨팅 장치 중 어느 하나가 되거나, 또는 랩탑(laptop), 태블릿(tablet), 패블릿(phablet), 휴대용 멀티미디어 재생장치(Portable Multimedia Player, PMP), 개인용 휴대 단말기(Personal Digital Assistants, PDA) 또는 전자책 단말기(E-book reader)과 같은 이동식 컴퓨팅 장치 중 어느 하나가 될 수 있다.For example, if the subject (i.e., service provider) of the service provision device 200 is a financial institution and provides one-stop banking, the user terminal 100 provides comprehensive banking services. It can be a smart phone that can run applications such as open banking and mobile banking. However, the type of user terminal is not limited to a smartphone, and can be any of the stationary computing devices such as a desktop, workstation, or server, or a laptop or tablet. It can be any of the following portable computing devices: phablets, Portable Multimedia Players (PMPs), Personal Digital Assistants (PDAs), or E-book readers. there is.

사용자 단말기(100)의 사용자가 서비스 제공 장치(200)의 주체가 제공하는 서비스를 이용하는 과정에서 필연적으로 다양한 형태의 데이터가 생성될 수 있으며, 사용자 단말기(100)는 사용자가 서비스를 이용하는 과정에서 생성된 다양한 데이터를 서비스 제공 장치(200)에 전송할 수 있다.In the process where the user of the user terminal 100 uses the service provided by the entity of the service providing device 200, various types of data may inevitably be generated, and the user terminal 100 generates data in the process of the user using the service. Various data can be transmitted to the service providing device 200.

다음 구성으로, 서비스 제공 장치(200)는 사용자 단말기(100)의 사용자에게 서비스를 제공하기 위하여, 서비스 제공자(service provider, 즉 주체)에 의해 제어될 수 있는 장치이다.In the following configuration, the service providing device 200 is a device that can be controlled by a service provider (that is, a subject) in order to provide a service to the user of the user terminal 100.

예를 들어, 사용자 단말기(100)의 사용자가 종합 은행 서비스를 이용할 경우, 서비스 제공 장치(200)는 종합 은행 서비스를 제공하기 위해 금융기관이 운용하는 데이터 서버가 될 수 있다. 그러나, 서비스 제공 장치(200)의 유형은 서버에 한정되지 아니하며, 데스크탑, 워크스테이션과 같은 고정식 컴퓨팅 장치 중 하나가 될 수 있다.For example, when the user of the user terminal 100 uses a comprehensive banking service, the service providing device 200 may be a data server operated by a financial institution to provide the comprehensive banking service. However, the type of service providing device 200 is not limited to servers, and may be one of fixed computing devices such as desktops and workstations.

서비스 제공 장치(200)는 사용자 단말기(100)로부터 수신된 다양한 데이터를 관계형 데이터베이스를 통해 관리 및 저장할 수 있다. 여기서, 관계형 데이터베이스는 구조화된 질의 언어(SQL)를 이용하여, 레코드와 컬럼으로 구성된 테이블에 데이터를 저장 및 관리할 수 있는 데이터베이스이다.The service providing device 200 may manage and store various data received from the user terminal 100 through a relational database. Here, a relational database is a database that can store and manage data in tables composed of records and columns using structured query language (SQL).

다음 구성으로, 데이터 분석 장치(300)는 사용자 단말기(100)에 의해 생성되고, 서비스 제공 장치(200)에 의해 관리 및 저장되는 데이터를 분석하기 위한 장치이다. 특히, 본 발명의 실시 예들에 따른 데이터 분석 장치(200)는 관계형 데이터베이스를 그래프 데이터베이스로 변환한 후, 그래프 데이터베이스를 활용하여 데이터 관계의 변화를 추적하고, 이상 징후를 탐지할 수 있는 장치이다.In the following configuration, the data analysis device 300 is a device for analyzing data generated by the user terminal 100 and managed and stored by the service providing device 200. In particular, the data analysis device 200 according to embodiments of the present invention is a device that can convert a relational database into a graph database, use the graph database to track changes in data relationships, and detect abnormalities.

이와 같은, 데이터 분석 장치(300)의 구체적인 구성 및 동작에 대해서는 추후 도 2 내지 도 16을 참조하여 보다 구체적으로 설명하기로 한다.The specific configuration and operation of the data analysis device 300 will be described in more detail later with reference to FIGS. 2 to 16.

한편, 상술한 바와 같은 데이터 분석 시스템을 구성하는 다양한 유형의 사용자 단말기(100), 서비스 제공 장치(200) 및 데이터 분석 장치(300)는 각각의 장치 사이를 직접 연결하는 보안 회선, 공용 유선 통신망 및 이동통신망 중 하나 이상이 조합된 네트워크를 이용하여 데이터를 송수신할 수 있다.Meanwhile, various types of user terminals 100, service provision devices 200, and data analysis devices 300 that constitute the data analysis system as described above include a security line directly connecting each device, a public wired communication network, and Data can be transmitted and received using a network that combines one or more of the mobile communication networks.

예를 들어, 공용 유선 통신망에는 이더넷(ethernet), 디지털가입자선(x Digital Subscriber Line, xDSL), 광동축 혼합망(Hybrid Fiber Coax, HFC), 광가입자망(Fiber To The Home, FTTH)가 포함될 수 있으나, 이에 한정되는 것은 아니다. 또한, 이동통신망에는 코드 분할 다중 접속(Code Division Multiple Access, CDMA), 와이드 밴드 코드 분할 다중 접속(Wideband CDMA, WCDMA), 고속 패킷 접속(High Speed Packet Access, HSPA), 롱텀 에볼루션(Long Term Evolution, LTE), 5세대 이동통신(5th generation mobile telecommunication)가 포함될 수 있으나, 이에 한정되는 것도 아니다.For example, public wired networks may include Ethernet, xDigital Subscriber Line (xDSL), Hybrid Fiber Coax (HFC), and Fiber To The Home (FTTH). However, it is not limited to this. In addition, mobile communication networks include Code Division Multiple Access (CDMA), Wideband CDMA (WCDMA), High Speed Packet Access (HSPA), and Long Term Evolution. LTE) and 5th generation mobile telecommunication may be included, but are not limited thereto.

이하, 상술한 바와 같은 특징을 가지는, 데이터 분석 서버(300)의 구성에 대하여 보다 구체적으로 설명하기로 한다.Hereinafter, the configuration of the data analysis server 300, which has the features described above, will be described in more detail.

도 2는 본 발명의 일 실시예에 따른 데이터 분석 장치의 논리적 구성도이다.Figure 2 is a logical configuration diagram of a data analysis device according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따른 데이터 분석 장치(300)는 통신부(305), 입출력부(310), 데이터베이스 변환부(315), 그래프 모델링부(320), 데이터 관계 추적부(325) 및 이상 징후 탐지부(330)를 포함하여 구성될 수 있다. As shown in FIG. 2, the data analysis device 300 according to an embodiment of the present invention includes a communication unit 305, an input/output unit 310, a database conversion unit 315, a graph modeling unit 320, and a data relationship It may be configured to include a tracking unit 325 and an abnormality detection unit 330.

이와 같은, 데이터 분석 장치(300)의 구성 요소들은 기능적으로 구분되는 요소들을 나타낸 것에 불과하므로, 둘 이상의 구성 요소가 실제 물리적 환경에서는 서로 통합되어 구현되거나, 하나의 구성 요소가 실제 물리적 환경에서는 서로 분리되어 구현될 수 있을 것이다.Since the components of the data analysis device 300 merely represent functionally distinct elements, two or more components may be implemented integrated with each other in the actual physical environment, or one component may be separated from each other in the actual physical environment. and can be implemented.

각각의 구성 요소에 대하여 설명하면, 통신부(305)는 사용자 단말기(100) 및 서비스 제공 장치(200) 중 하나 이상과 데이터를 송수신할 수 있다. Describing each component, the communication unit 305 can transmit and receive data with one or more of the user terminal 100 and the service providing device 200.

구체적으로, 통신부(305)는 서비스 제공 장치(200)로부터 관계형 데이터베이스에 관한 데이터를 수신할 수 있다. 여기서, 관계형 데이터베이스에 관한 데이터는 사용자 단말기(100)에 의해 생성되고, 서비스 제공 장치(200)에 의해 관리 및 저장되는 데이터이다. Specifically, the communication unit 305 may receive data related to a relational database from the service providing device 200. Here, data related to the relational database is data created by the user terminal 100 and managed and stored by the service providing device 200.

통신부(305)는 구조화된 질의 언어(SQL)에 따른 질의(query)를 서비스 제공 장치(200)에 전송한 한 후 이에 대한 응답으로 데이터를 수신하거나, 서비스 제공 장치(200)의 주체와의 계약(contract)에 기반하여 일방적으로 데이터를 수신할 수도 있다.The communication unit 305 transmits a query according to structured query language (SQL) to the service provision device 200 and then receives data in response or enters into a contract with the subject of the service provision device 200. Data can also be received unilaterally based on a (contract).

통신부(305)는 데이터 관계 추적부(325)에 의해 생성된 사용자 인터페이스(UI)를 포함하는 데이터를 서비스 제공 장치(200)에 전송할 수 있다. 그리고, 통신부(305)는 이상 징후 탐지부(330)에 의해 판단된 이상 징후에 관한 정보를 서비스 제공 장치(200)에 전송할 수 있다. The communication unit 305 may transmit data including a user interface (UI) generated by the data relationship tracking unit 325 to the service providing device 200. Additionally, the communication unit 305 may transmit information about the abnormality detected by the abnormality detection unit 330 to the service providing device 200 .

다음 구성으로, 입출력부(310)는 데이터 분석과 관련된 다양한 종류의 데이터를 입력 받거나 또는 출력할 수 있다.With the following configuration, the input/output unit 310 can input or output various types of data related to data analysis.

구체적으로, 입출력부(310)는 서비스 제공 장치(200)의 주체와의 계약(contract)에 기반하여 저장장치로부터 데이터를 입력 받을 수 있다.Specifically, the input/output unit 310 may receive data from the storage device based on a contract with the subject of the service providing device 200.

입출력부(310)는 그래프 모델링부(320)에 의해 시각화된 그래프를 출력할 수 있다. 입출력부(310)는 데이터 관계 추적부(325)에 의해 생성된 사용자 인터페이스(UI)를 출력할 수 있다. 그리고, 입출력부(310)는 이상 징후 탐지부330)에 의해 판단된 이상 징후에 관한 정보를 출력할 수 있다.The input/output unit 310 may output a graph visualized by the graph modeling unit 320. The input/output unit 310 may output a user interface (UI) generated by the data relationship tracking unit 325. Additionally, the input/output unit 310 may output information about the abnormality detected by the abnormality detection unit 330.

통신부(305)에 의해 수신되거나 또는 입출력부(310)에 의해 입력된 관계형 데이터베이스에 관한 데이터는 테이블(table) 형태로 구성될 수 있으며, CSV(Comma Separated Values) 또는 JSON(Java Script Object Notation) 형식을 가질 수 있으나, 이에 한정되지 않는다.Data about the relational database received by the communication unit 305 or input by the input/output unit 310 may be organized in table form and may be formatted in CSV (Comma Separated Values) or JSON (Java Script Object Notation) format. It may have, but is not limited to this.

다음 구성으로, 데이터베이스 변환부(315)는 관계형 데이터베이스에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스로 변환할 수 있다.With the following configuration, the database conversion unit 315 can convert all or part of the data included in the relational database into a graph database.

여기서, 관계형 데이터베이스는 구조화된 질의 언어(SQL)를 이용하여, 레코드와 컬럼으로 구성된 테이블에 데이터를 저장 및 관리할 수 있는 데이터베이스이다. 그래프 데이터베이스는 그래프 데이터베이스는 데이터, 데이터 사이의 관계를 그래프의 노드, 엣지로 표현하며, 구조화된 질의 언어를 사용하지 않는(No SQL) 비관계형 데이터베이스이다. Here, a relational database is a database that can store and manage data in tables composed of records and columns using structured query language (SQL). A graph database is a non-relational database that expresses data and relationships between data as nodes and edges of a graph, and does not use a structured query language (No SQL).

잠시 도 3을 참조하여, 데이터베이스 변환부(315)의 동작에 대해 설명한다.Referring to FIG. 3 for a moment, the operation of the database conversion unit 315 will be described.

도 3은 본 발명의 일 실시예에 따라 관계형 데이터베이스로부터 변환된 그래프 데이터베이스를 설명하기 위한 예시도이다.Figure 3 is an exemplary diagram illustrating a graph database converted from a relational database according to an embodiment of the present invention.

도 3을 참조하여 설명하면, 우선 데이터베이스 변환부(315)는 관계형 데이터베이스에 접근할 수 있다. 예를 들어, 데이터베이스 변환부(315)는 JDBC(Java DataBase Connectivity) 드라이버(driver)를 통해 관계형 데이터베이스에 접근할 수 있다. 그리고, 데이터베이스 변환부(315)는 접근된 관계형 데이터베이스의 테이블(table)을 조회할 수 있게 된다.Referring to FIG. 3 , first, the database conversion unit 315 can access a relational database. For example, the database conversion unit 315 can access a relational database through a JDBC (Java DataBase Connectivity) driver. And, the database conversion unit 315 can query the tables of the accessed relational database.

데이터베이스 변환부(315)는 관계형 데이터베이스에 포함된 전부 또는 일부의 엔티티(entity)들에 대응하여, 그래프 데이터베이스를 구성할 노드를 개별적으로 생성할 수 있다. 여기서, 엔티티는 현실 세계의 객체(object)를 관계형 데이터베이스 상에 표현하기 위한 추상적인 객체를 의미한다. 하나의 엔티티는 하나의 테이블에 대응될 수 있으나, 경우에 따라 하나의 엔티티가 복수 개의 테이블에 대응될 수도 있다.The database conversion unit 315 may individually create nodes that will form a graph database in response to all or part of the entities included in the relational database. Here, an entity refers to an abstract object for expressing objects in the real world on a relational database. One entity may correspond to one table, but in some cases, one entity may correspond to multiple tables.

데이터베이스 변환부(315)는 관계형 데이터베이스에 포함된 전부 또는 일부의 엔티티들 사이의 릴레이션(relation)에 대응하여, 그래프 데이터베이스를 구성할 엣지를 개별적으로 생성할 수 있다. 여기서, 릴레이션은 관계형 데이터베이스에서 정보를 구분하거나 연결하는 단위를 의미한다.The database conversion unit 315 may individually create edges that will form a graph database in response to relationships between all or part of the entities included in the relational database. Here, a relation refers to a unit that divides or connects information in a relational database.

데이터베이스 변환부(315)는 관계형 데이터베이스에 포함된 전부 또는 일부의 어트리뷰트(attribute)에 대응하여, 그래프 데이터베이스를 구성할 노드 및 엣지 중 하나 이상에게 속성을 부여할 수 있다. 여기서, 어트리뷰트는 엔티티 또는 릴레이션을 설명할 수 있는 구체적인 특성 의미한다.The database conversion unit 315 may assign attributes to one or more of the nodes and edges that will form the graph database in response to all or part of the attributes included in the relational database. Here, an attribute refers to a specific characteristic that can describe an entity or relation.

그리고, 데이터베이스 변환부(315)는 개별적으로 생성된 노드, 엣지 및 속성을 배치(batch) 처리함으로써, 그래프 데이터베이스를 생성할 수 있다.Additionally, the database conversion unit 315 can create a graph database by batch processing individually created nodes, edges, and attributes.

다시 도 2를 참조하여, 데이터 분석 장치(300)의 구성 요소를 이어서 설명한다.Referring again to FIG. 2, the components of the data analysis device 300 will be described next.

다음 구성으로, 그래프 모델링부(320)는 그래프 데이터베이스에 포함된 전부 또는 일부의 데이터를 그래프 형태로 시각화할 수 있다.With the following configuration, the graph modeling unit 320 can visualize all or part of the data included in the graph database in graph form.

여기서, 그래프는 공집합이 아닌 노드의 유한집합과, 노드의 쌍으로 구성된 엣지의 유한집합으로 구성되는 데이터 구조이다. 이 경우, 그래프의 노드는 정점 또는 포인트라 지칭되기도 하고, 엣지는 간선, 라인 또는 관계라 지칭되기도 한다.Here, the graph is a data structure composed of a finite set of nodes that are not empty sets, and a finite set of edges composed of pairs of nodes. In this case, the nodes of the graph may be referred to as vertices or points, and the edges may be referred to as edges, lines, or relationships.

구체적으로, 그래프 모델링부(320)는 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성의 전부 또는 일부에 대응하여 그래프를 생성할 수 있다. 그리고, 그래프 모델링부(320)는 생성된 그래프의 구조를 이차원 이미지로 표현할 수 있다.Specifically, the graph modeling unit 320 may generate a graph corresponding to all or part of the nodes, edges, and attributes that make up the graph database. Additionally, the graph modeling unit 320 can express the structure of the generated graph as a two-dimensional image.

일 실시예에 따르면, 그래프 모델링부(320)에 의해 이차원 이미지로 표현된 그래프는 도 3에 도시된 바와 같이 모든 노드 및 엣지가 동일한 크기, 굵기 및 색상을 가질 수 있다. 그러나, 이에 한정되지 아니하고, 그래프 모델링부(320)는 그래프의 노드 또는 엣지가 다양한 크기, 굵기 또는 색상을 가지는 이차원 이미지를 표현할 수도 있다.According to one embodiment, the graph expressed as a two-dimensional image by the graph modeling unit 320 may have all nodes and edges of the same size, thickness, and color, as shown in FIG. 3. However, it is not limited to this, and the graph modeling unit 320 may represent a two-dimensional image in which the nodes or edges of the graph have various sizes, thicknesses, or colors.

잠시 도 4 및 도 5를 참조하여, 그래프 모델링부(320)에 의해 생성된 이차원 이미지의 예시에 대해 설명한다.Briefly referring to FIGS. 4 and 5 , an example of a two-dimensional image generated by the graph modeling unit 320 will be described.

도 4 및 도 5는 본 발명의 다양한 실시예에 따라 시각화된 그래프들의 형태를 설명하기 위한 예시도이다.Figures 4 and 5 are exemplary diagrams for explaining the form of graphs visualized according to various embodiments of the present invention.

도 4에 도시된 바와 같이, 그래프 모델링부(320)는 그래프의 구조를 이차원 이미지로 표현함에 있어, 그래프를 구성하는 노드의 속성에 대응하여, 이차원 이미지에 의해 표현될 노드의 크기(size)를 개별적으로 설정할 수 있다. 또한, 그래프 모델링부(320)는 그래프를 구성하는 엣지의 속성에 대응하여, 이차원 이미지에 의해 표현될 엣지의 굵기(thickness)를 개별적으로 설정할 수 있다.As shown in FIG. 4, in expressing the structure of the graph as a two-dimensional image, the graph modeling unit 320 determines the size of the node to be represented by the two-dimensional image in response to the properties of the nodes constituting the graph. Can be set individually. Additionally, the graph modeling unit 320 may individually set the thickness of the edge to be expressed by the two-dimensional image in response to the properties of the edge constituting the graph.

이와 다르게 도 5에 도시된 바와 같이, 그래프 모델링부(320)는 그래프의 구조를 이차원 이미지로 표현함에 있어, 그래프를 구성하는 노드의 속성에 대응하여, 이차원 이미지에 의해 표현될 노드의 색상(color)을 개별적으로 설정할 수 있다. 또한, 그래프 모델링부(320)는 그래프를 구성하는 엣지의 속성에 대응하여, 이차원 이미지에 의해 표현될 엣지의 색상을 개별적으로 설정할 수도 있다.Unlike this, as shown in FIG. 5, the graph modeling unit 320 expresses the structure of the graph as a two-dimensional image, corresponding to the properties of the nodes constituting the graph, and sets the color of the node to be expressed by the two-dimensional image. ) can be set individually. Additionally, the graph modeling unit 320 may individually set the color of the edge to be expressed by the two-dimensional image in response to the properties of the edge constituting the graph.

다시 도 2를 참조하여 데이터 분석 장치(300)의 구성 요소를 이어서 설명한다. Components of the data analysis device 300 will be described next with reference to FIG. 2 .

한편, 그래프 모델링부(320)는 데이터 관계 추적부(325)를 위하여, 시계열적으로 변화되는 그래프를 대상으로 복수 개의 이차원 이미지를 생성할 수 있다. 보다 상세하게, 그래프 모델링부(320)는 그래프 데이터베이스를 구성하는 모든 노드, 엣지 및 속성이 개별적으로 생성, 삭제 또는 수정된 시각을 식별할 수 있다. 그래프 모델링부(320)는 식별된 시각별로 노드, 엣지 및 속성에 대응하는 그래프를 각각 생성할 수 있다. 그리고, 그래프 모델링부(320)는 시각별로 각각 생성된 그래프의 구조를 이차원 이미지로 표현할 수 있다.Meanwhile, the graph modeling unit 320 may generate a plurality of two-dimensional images for the data relationship tracking unit 325 for a graph that changes in time series. More specifically, the graph modeling unit 320 can identify the time at which all nodes, edges, and attributes constituting the graph database were individually created, deleted, or modified. The graph modeling unit 320 may generate graphs corresponding to nodes, edges, and attributes for each identified time point. Additionally, the graph modeling unit 320 can express the structure of each graph generated for each time as a two-dimensional image.

그래프 모델링부(320)는 이상 징후 탐지부(330)를 위하여, 다양한 해상도를 가지는 이미지 그룹을 생성할 수 있다. 보다 상세하게, 그래프 모델링부(320)는 그래프를 대상으로 기 생성된 이차원 이미지의 해성도를 변경하여 서로 다른 해상도를 가지는 복수 개의 이차원 이미지들로 구성된 이미지 그룹을 생성할 수 있다.The graph modeling unit 320 may generate image groups with various resolutions for the anomaly detection unit 330. More specifically, the graph modeling unit 320 may change the resolution of a two-dimensional image previously generated for a graph to create an image group composed of a plurality of two-dimensional images with different resolutions.

또한, 그래프 모델링부(320)는 이상 징후 탐지부(330)를 위하여, 서브 그래프에 대한 이차원 이미지를 생성할 수도 있다. 보다 상세하게, 그래프 모델링부(320)는 그래프를 구성하는 노드, 엣지 및 속성 중 일부만을 가지는 복수 개의 서브 그래프에 대한 이차원 이미지들로 구성된 이미지 그룹을 생성할 수도 있다.Additionally, the graph modeling unit 320 may generate a two-dimensional image of the subgraph for the anomaly detection unit 330. More specifically, the graph modeling unit 320 may generate an image group consisting of two-dimensional images for a plurality of subgraphs having only some of the nodes, edges, and properties that make up the graph.

다음 구성으로, 데이터 관계 추적부(325)는 시계열적으로 변화되는 데이터의 관계성을 추적할 수 있는 사용자 인터페이스(User Interface, UI)를 제공할 수 있다.With the following configuration, the data relationship tracking unit 325 can provide a user interface (UI) that can track relationships of data that change over time.

구체적으로, 데이터 관계 추적부(325)는 그래프 모델링부(320)에 의해 시각별로 각각 생성된 그래프 구조의 이차원 이미지를 기초로, 그래프의 변화 과정을 시계열적으로 탐색할 수 있는 사용자 인터페이스(UI)를 생성할 수 있다. 그리고, 데이터 관계 추적부(325)는 생성된 사용자 인터페이스를 포함하는 데이터를 통신부(305)를 통해 전송하거나, 또는 입출력부(310)를 통해 출력할 수 있다.Specifically, the data relationship tracking unit 325 is a user interface (UI) that can time-serially explore the change process of the graph based on the two-dimensional image of the graph structure generated for each time by the graph modeling unit 320. can be created. Additionally, the data relationship tracking unit 325 may transmit data including the generated user interface through the communication unit 305 or output it through the input/output unit 310.

잠시 도 6 내지 9를 참조하여, 그래프의 변화 과정을 탐색하기 위한 사용자 인터페이스(UI)에 대해 설명한다.Briefly referring to FIGS. 6 to 9, a user interface (UI) for exploring the change process of the graph will be described.

도 6은 본 발명의 일 실시예에 따라 그래프의 변화 과정을 시계열적으로 탐색할 수 있는 사용자 인터페이스(UI)를 설명하기 위한 예시도이다.Figure 6 is an example diagram illustrating a user interface (UI) that can sequentially explore the change process of a graph according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 데이터 관계 추적부(325)에 의해 생성된 사용자 인터페이스(UI)에는 이차원 이미지와 시계열적 탐색을 위한 슬라이드 바(slide bar)가 포함될 수 있다.As shown in FIG. 6, the user interface (UI) created by the data relationship tracking unit 325 may include a two-dimensional image and a slide bar for time-series exploration.

여기서, 슬라이드 바는 탐색 시간의 제어를 위하여 사용자로부터 슬라이드를 좌우 또는 상하 방향으로 이동시키는 명령을 입력 받을 수 있는 사용자 인터페이스이다. 그리고, 이차원 이미지는 슬라이드 바를 통해 사용자로부터 입력된 시간에 대응하는 그래프의 구조를 출력하기 위한 이미지이다.Here, the slide bar is a user interface that can receive commands from the user to move the slide left and right or up and down to control the navigation time. And, the two-dimensional image is an image for outputting the structure of a graph corresponding to the time input by the user through the slide bar.

특히, 본 발명의 일 실시예에 따른 사용자 인터페이스(UI)에는 그래프 구조에 특별한 변화가 발생된 시점을 지시하는 그래픽(graphic)이 더 포함될 수 있다.In particular, the user interface (UI) according to an embodiment of the present invention may further include graphics indicating when a special change occurs in the graph structure.

이를 위하여, 데이터 관계 추적부(325)는 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성이 개별적으로 생성, 삭제 또는 수정된 경우(case)들 중에서 사전에 설정된 기준을 만족하는 이벤트(event)를 식별할 수 있다. 그리고, 데이터 관계 추적부(325)는 사용자 인터페이스 상의 식별된 이벤트가 발생된 시각에 대응되는 위치에 이벤트가 발생됨을 지시하는 그래픽을 추가할 수 있다. To this end, the data relationship tracking unit 325 identifies events that satisfy preset criteria among cases where nodes, edges, and attributes constituting the graph database are individually created, deleted, or modified. You can. Additionally, the data relationship tracking unit 325 may add a graphic indicating that an event occurs at a location corresponding to the time at which the identified event occurred on the user interface.

이 경우, 사전에 설정된 기준에는 그래프 내에 단절점(articulation point), 자기 루프(self-loop), 사이클(cycle), 완전 그래프(complete graph), 이중 결합 요소(biconnected component) 중 하나 이상이 생성, 삭제 또는 수정된 경우가 될 수 있으나, 이에 한정되는 것은 아니다. 여기서, 단절점은 노드와 해당 노드에 부속된 모든 엣지를 같이 제거하였을 때, 그래프를 둘 이상의 서브 그래프로 분리시킬 수 있는 노드이다. 자기 루프는 엣지가 출발(tail)하는 노드와 엣지가 도착(head)하는 노드가 동일한 엣지이다. 사이클은 출발 노드와 도착 노드가 서로 동일하며, 출발 노드와 도착 노드 사이에서 엣지에 의해 연결된 하나 이상의 노드가 모두 서로 상이한 경로(path)이다. 완전 그래프는 n개의 정점을 가지는 그래프가 가질 수 있는 최대 간선 수인 n(n-1)/2개의 간선을 가지는 경우이다. 그리고, 이중 결합 요소는 단절점을 가지지 않는 그래프이다.In this case, the pre-set criteria include the creation of one or more of the following: articulation point, self-loop, cycle, complete graph, and biconnected component in the graph. It may be deleted or modified, but is not limited to this. Here, a break point is a node that can separate the graph into two or more subgraphs when the node and all edges attached to the node are removed. A self-loop is an edge where the node where the edge starts (tail) and the node where the edge arrives (head) are the same. In a cycle, the departure node and the arrival node are the same, and one or more nodes connected by an edge between the departure node and the arrival node are all different paths. A complete graph is a case where a graph with n vertices has n(n-1)/2 edges, which is the maximum number of edges a graph can have. And, a double coupled element is a graph that has no breakpoints.

도 7 내지 도 9는 본 발명의 다양한 실시예에 따라 이벤트로 식별되는 경우들을 설명하기 위한 예시도이다.7 to 9 are exemplary diagrams for explaining cases identified as events according to various embodiments of the present invention.

도 7을 참조하여 일 실시예를 설명하면, 데이터 관계 추적부(325)는 그래프 데이터베이스를 구성하는 노드 중에서 단절점에 해당하는 노드가 존재하는 경우, 단절점에 해당하는 노드의 생성, 삭제 또는 수정된 경우를 이벤트로 식별할 수 있다.To describe an embodiment with reference to FIG. 7, when a node corresponding to a disconnection point exists among the nodes constituting the graph database, the data relationship tracking unit 325 creates, deletes, or modifies the node corresponding to the disconnection point. Cases that occur can be identified as events.

도 8을 참조하여 다른 실시예를 설명하면, 데이터 관계 추적부(325)는 그래프 데이터베이스를 구성하는 엣지 중에서 자기 루프에 해당하는 엣지가 존재하는 경우, 자기 루프에 해당하는 엣지의 생성, 삭제 또는 수정된 경우를 이벤트로 식별할 수 있다.To describe another embodiment with reference to FIG. 8, when an edge corresponding to a self-loop exists among the edges constituting the graph database, the data relationship tracking unit 325 creates, deletes, or modifies the edge corresponding to the self-loop. Cases that occur can be identified as events.

그리고, 도 9를 참조하여 또 다른 실시예를 설명하면, 데이터 관계 추적부(325)는 그래프 데이터베이스를 구성하는 둘 이상의 노드들에 의해 형성된 사이클에 포함된 노드의 개수가 사전에 설정된 임계 개수 이상인 경우, 사이클이 생성, 삭제 또는 수정된 경우를 이벤트로 식별할 수 있다.And, to describe another embodiment with reference to FIG. 9, the data relationship tracking unit 325 operates when the number of nodes included in a cycle formed by two or more nodes constituting the graph database is more than a preset threshold number. , events can be identified when a cycle is created, deleted, or modified.

다음 구성으로, 이상 징후 탐지부(330)는 그래프 모델링부(320)에 의해 생성된 그래프를 이상 징후 탐지(fraud detection)를 위해 학습된 인공지능(AI) 모델에 입력한 후, 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단할 수 있다.In the following configuration, the anomaly detection unit 330 inputs the graph generated by the graph modeling unit 320 into an artificial intelligence (AI) model learned for fraud detection, and then inputs the graph generated by the graph modeling unit 320 into an artificial intelligence (AI) model learned for fraud detection. ) The presence or absence of abnormalities can be determined based on the results output from the model.

이하 도 10 내지 도 14를 참조하여, 인공지능(AI) 모델을 이용하여 이상 징후를 탐지하는 과정에 대해 설명한다.Hereinafter, with reference to FIGS. 10 to 14, the process of detecting abnormalities using an artificial intelligence (AI) model will be described.

도 10은 본 발명의 일 실시예에 따른 인공지능(AI) 모델의 구조를 설명하기 위한 예시도이다.Figure 10 is an example diagram for explaining the structure of an artificial intelligence (AI) model according to an embodiment of the present invention.

도 10에 도시된 바와 같이, 이상 징후 탐지부(330)는 합성곱신경망(Convolutional Neural Network, CNN)으로 구현된 인공지능(AI) 모델에 그래프의 구조가 표현된 이차원 이미지를 입력할 수 있다. 여기서, 합성곱신경망(CNN)은 트랜스포머 인코더-디코더(transformer encoder-decoder) 구조를 가질 수 있다.As shown in FIG. 10, the anomaly detection unit 330 can input a two-dimensional image expressing the structure of a graph to an artificial intelligence (AI) model implemented with a convolutional neural network (CNN). Here, the convolutional neural network (CNN) may have a transformer encoder-decoder structure.

보다 상세하게, 이상 징후 탐지부(330)는 이차원 이미지를 구성하는 데이터를 일렬로 나열하여 하나의 시퀀스(sequence)로 변환하여 인코더에 입력할 수 있다. 이상 징후 탐지부(330)는 인코더에서 셀프 어텐션(self-attention)을 적용하여 시퀀스 내의 데이터 위치가 서로 연결된 하나의 벡터(vector)로 변환한 후, 변환된 벡터를 디코더에 입력할 수 있다. 이상 징후 탐지부(330)는 디코더에서 헝가리안 알고리즘(Hungarian algorithm)에 기반한 손실 함수(loss function)을 사용하여 결과 값을 출력한다. 그리고, 이상 징후 탐지부(330)는 디코더로부터 출력된 결과 값을 FFN(Feed-Forward Network)에 입력하여, 이상 징후에 대응하는 공간적 패턴을 예측할 수 있다.More specifically, the anomaly detection unit 330 can line up the data constituting the two-dimensional image, convert it into a sequence, and input it to the encoder. The anomaly detection unit 330 may apply self-attention in the encoder to convert the data positions in the sequence into a single vector connected to each other, and then input the converted vector to the decoder. The anomaly detection unit 330 outputs a result value using a loss function based on the Hungarian algorithm in the decoder. Additionally, the anomaly detection unit 330 can input the result value output from the decoder into a feed-forward network (FFN) to predict a spatial pattern corresponding to the anomaly.

도 11은 본 발명의 일 실시예에 따른 인코더의 콘볼루션 연산을 설명하기 위한 예시도이다.Figure 11 is an example diagram for explaining a convolution operation of an encoder according to an embodiment of the present invention.

도 11에 도시된 바와 같이, 본 발명의 일 실시예에 따른 합성곱신경망(CNN)의 인코더는 입력 특징 맵(input feature map)에 콘볼루션(convolution) 연산을 수행하여 출력 특징 맵(output feature map)을 생성함에 있어, 특징을 추출하는 영역인 콘볼루션 필터(convolution filter) 크기에 학습 가능한 오프셋(offset)을 반영하여 콘볼루션 연산을 수행하도록 변형될 수 있다. 이러한 변형에 의해, 인코더는 일정하게 정해진 콘볼루션 필터의 크기보다 넓은 범위의 그리드(grid) 영역으로부터 특징(feature)을 추출할 수 있다.As shown in FIG. 11, the encoder of the convolutional neural network (CNN) according to an embodiment of the present invention performs a convolution operation on the input feature map to produce an output feature map. ), it can be modified to perform a convolution operation by reflecting a learnable offset in the size of the convolution filter, which is the area from which features are extracted. Through this modification, the encoder can extract features from a grid area that is wider than the size of a uniformly determined convolution filter.

도 12는 본 발명의 일 실시예에 따른 인코더의 셀프 어텐션을 설명하기 위한 예시도이다.Figure 12 is an example diagram for explaining self-attention of an encoder according to an embodiment of the present invention.

도 12에 도시된 바와 같이, 본 발명의 일 실시예에 따른 합성곱신경망(CNN)의 인코더는 셀프 어텐션을 수행함에 있어, 콘볼루션 필터의 크기에 반영된 오프셋을 어텐션(attention)의 키(key)로 사용하도록 변형될 수 있다. 이러한 변형에 의해, 이차원 이미지로부터 큰 객체를 예측해야 하는 경우에는 큰 오프셋(offset)이 학습되고, 작은 객체를 예측해야 하는 경우에는 작은 오프셋이 학습됨으로써, 상대적으로 작은 객체에 대한 예측 성능이 낮았던 종래의 합성곱신경망(CNN)의 성능을 개선할 수 있다.As shown in FIG. 12, when performing self-attention, the encoder of a convolutional neural network (CNN) according to an embodiment of the present invention uses the offset reflected in the size of the convolution filter as an attention key. It can be modified to be used as . By this modification, when a large object must be predicted from a two-dimensional image, a large offset is learned, and when a small object must be predicted, a small offset is learned, so that the prediction performance for relatively small objects was low. The performance of convolutional neural networks (CNN) can be improved.

한편, 이상 징후 탐지부(330)는 단순히 하나의 이차원 이미지만을 기반으로 이상 징후의 유무를 판단하지 아니하고, 하나의 그래프를 대상으로 다양하게 확장된 복수 개의 이차원 이미지를 기반으로 이상 징후의 유무를 판단할 수 있다.Meanwhile, the abnormality detection unit 330 does not simply determine the presence or absence of an abnormality based on only one two-dimensional image, but determines the presence or absence of an abnormality based on a plurality of two-dimensional images variously expanded for one graph. can do.

도 13 및 도 14는 본 발명의 다양한 실시예에 따른 인공지능(AI) 모델에 입력될 수 있는 이미지들의 형태를 설명하기 위한 예시도이다.Figures 13 and 14 are illustrative diagrams to explain the types of images that can be input to an artificial intelligence (AI) model according to various embodiments of the present invention.

도 13에 도시된 바와 같이, 이상 징후 탐지부(330)는 이미지 그룹에 포함된 모든 이미지를 합성곱신경망(CNN)으로 구현된 인공지능(AI)에 입력할 수 있다. 그리고, 이상 징후 탐지부(330)는 인공지능(AI) 모델로부터 출력된 복수 개의 결과 값을 조합하여 이상 징후의 유무를 판단할 수 있다. 여기서, 이미지 그룹은 그래프 모델링부(320)에 의해 생성된, 서로 다른 해상도를 가지는 복수 개의 이차원 이미지들로 구성된 그룹이 될 수 있다. 이와 다르게, 이미지 그룹은 그래프 모델링부(320)에 의해 생성된, 그래프를 구성하는 노드, 엣지 및 속성 중 일부만을 가지는 복수 개의 서브 그래프에 대한 이차원 이미지들로 구성된 그룹이 될 수도 있다.As shown in FIG. 13, the anomaly detection unit 330 can input all images included in the image group into artificial intelligence (AI) implemented with a convolutional neural network (CNN). Additionally, the anomaly detection unit 330 may determine the presence or absence of an anomaly by combining a plurality of result values output from an artificial intelligence (AI) model. Here, the image group may be a group composed of a plurality of two-dimensional images with different resolutions generated by the graph modeling unit 320. Alternatively, the image group may be a group created by the graph modeling unit 320 and composed of two-dimensional images of a plurality of subgraphs having only some of the nodes, edges, and properties constituting the graph.

이하 상술한 바와 같은 논리적 구성 요소들을 실현하기 위한, 데이터 분석 장치(300)의 하드웨어에 대하여 보다 구체적으로 설명하기로 한다.Hereinafter, the hardware of the data analysis device 300 for realizing the above-described logical components will be described in more detail.

도 15는 본 발명의 일 실시예에 따른 데이터 분석 장치의 하드웨어 구성도이다.Figure 15 is a hardware configuration diagram of a data analysis device according to an embodiment of the present invention.

도 15에 도시된 바와 같이, 데이터 분석 장치(300)는 프로세서(processor, 350), 메모리(memory, 355), 송수신기(transceiver, 360), 입출력장치(input/output device, 365), 데이터 버스(bus, 370) 및 스토리지(storage, 375)를 포함하여 구성될 수 있다. As shown in FIG. 15, the data analysis device 300 includes a processor 350, a memory 355, a transceiver 360, an input/output device 365, and a data bus ( It may be configured to include a bus 370) and storage 375.

구체적으로, 프로세서(350)는 메모리(355)에 상주된 데이터 관계 추적 방법 및/또는 이상 징후 탐지 방법이 구현된 소프트웨어(380a)에 따른 명령어를 기초로, 데이터 분석 장치(300)의 동작 및 기능을 구현할 수 있다. Specifically, the processor 350 operates and functions the data analysis device 300 based on instructions according to the software 380a in which the data relationship tracking method and/or the anomaly detection method resident in the memory 355 are implemented. can be implemented.

메모리(355)에는 스토리지(375)에 저장된 데이터 관계 추적 방법 및/또는 이상 징후 탐지 방법이 구현된 소프트웨어(380b)가 상주(loading)될 수 있다. The memory 355 may be loaded with software 380b that implements a data relationship tracking method and/or an anomaly detection method stored in the storage 375.

입출력장치(365)는 프로세서(350)의 명령에 따라, 데이터 분석 장치(300)의 동작에 필요한 신호를 입력 받거나 연산 결과를 외부로 출력할 수 있다.The input/output device 365 may receive signals required for operation of the data analysis device 300 or output calculation results to the outside according to instructions from the processor 350.

데이터 버스(370)는 프로세서(350), 메모리(355), 송수신기(360), 입출력장치(365), 및 스토리지(375)와 각각 연결되어, 각각의 구성 요소 사이에서 신호를 전달하기 위한 이동 통로의 역할을 수행할 수 있다.The data bus 370 is connected to the processor 350, memory 355, transceiver 360, input/output device 365, and storage 375, and is a moving path for transmitting signals between each component. can perform the role of

스토리지(375)에는 본 발명의 실시예들에 따른 데이터 관계 추적 방법 및/또는 이상 징후 탐지 방법이 구현된 소프트웨어(380a)의 실행을 위해 필요한 애플리케이션 프로그래밍 인터페이스(Application Programming Interface, API), 라이브러리(library) 파일, 리소스(resource) 파일 등이 저장될 수 있다. 스토리지(375)에는 본 발명의 실시예들에 따른 데이터 관계 추적 방법 및/또는 이상 징후 탐지 방법이 구현된 소프트웨어(380b)가 저장될 수 있다. 그리고, 스토리지(375)에는 합성곱신경망(CNN)을 기반으로 구현된 인공지능(AI) 모델이 저장될 수 있다.The storage 375 contains an application programming interface (API) and a library necessary for executing the software 380a implementing the data relationship tracking method and/or anomaly detection method according to embodiments of the present invention. ) files, resource files, etc. may be stored. The storage 375 may store software 380b implementing a data relationship tracking method and/or anomaly detection method according to embodiments of the present invention. Additionally, an artificial intelligence (AI) model implemented based on a convolutional neural network (CNN) may be stored in the storage 375.

본 발명의 일 실시예에 따르면, 메모리(355)에 상주되거나 또는 스토리지(375)에 저장된 데이터 관계 추적 방법을 구현하기 위한 소프트웨어(380a, 380b)는 프로세서(350)가 관계형 데이터베이스에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스로 변환하는 단계, 프로세서(350)가 상기 변환된 그래프 데이터베이스를 구성하는 모든 노드, 엣지 및 속성이 개별적으로 생성, 삭제 또는 수정된 시각을 식별하고, 상기 식별된 시각별로 상기 노드, 엣지 및 속성에 대응하는 그래프를 각각 생성하는 단계, 프로세서(350)가 상기 시각별로 각각 생성된 그래프의 변화 과정을 시계열적으로 탐색할 수 있는 사용자 인터페이스(UI)를 생성하는 단계를 실행시키기 위하여, 기록매체에 기록된 컴퓨터 프로그램이 될 수 있다.According to one embodiment of the present invention, software 380a, 380b for implementing a method for tracking data relationships resident in memory 355 or stored in storage 375 allows the processor 350 to track data included in a relational database. Converting all or part of the graph database to a graph database, the processor 350 identifies the time at which all nodes, edges, and attributes constituting the converted graph database were individually created, deleted, or modified, and Executing the steps of generating graphs corresponding to nodes, edges, and attributes, and generating a user interface (UI) that allows the processor 350 to time-serially explore the change process of the graphs created for each time. For this purpose, it may be a computer program recorded on a recording medium.

본 발명의 다른 실시예에 따르면, 메모리(355)에 상주되거나 또는 스토리지(375)에 저장된 이상 징후 탐지 방법을 구현하기 위한 소프트웨어(380a, 380b)는 프로세서(350)가 관계형 데이터베이스에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스로 변환하는 단계, 프로세서(350)가 상기 변환된 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성의 전부 또는 일부에 대응하여 그래프를 생성하는 단계, 프로세서(350)가 이상 징후 탐지를 위해 학습된 인공지능(AI) 모델에 상기 그래프를 입력한 후, 상기 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단하는 단계를 실행시키기 위하여, 기록매체에 기록된 컴퓨터 프로그램이 될 수 있다.According to another embodiment of the present invention, software 380a, 380b for implementing an anomaly detection method resident in the memory 355 or stored in the storage 375 allows the processor 350 to retrieve data included in the relational database. Converting all or part of the graph database into a graph database, the processor 350 generating a graph corresponding to all or part of the nodes, edges, and attributes constituting the converted graph database, and the processor 350 detecting abnormalities. After inputting the graph into the artificial intelligence (AI) model learned for the above, record it on a recording medium to execute the step of determining the presence or absence of abnormalities based on the result value output from the artificial intelligence (AI) model. It can be a computer program.

보다 상세하게, 프로세서(350)는 중앙 처리 장치(Central Processing Unit, CPU), ASIC(Application-Specific Integrated Circuit), 칩셋(chipset), 논리 회로 중 하나 이상을 포함하여 구성될 수 있으며, 이에 한정되지 않는다.More specifically, the processor 350 may include, but is not limited to, one or more of a central processing unit (CPU), an application-specific integrated circuit (ASIC), a chipset, and a logic circuit. No.

메모리(355)는 ROM(Read-Only Memory), RAM(Random Access Memory), 플래쉬 메모리(flash memory), 메모리 카드(memory card) 중 하나 이상을 포함하여 구성될 수 있으며, 이에 한정되지 않는다.The memory 355 may include, but is not limited to, one or more of read-only memory (ROM), random access memory (RAM), flash memory, and memory card.

입출력장치(360)는 버튼, 스위치, 키보드, 마우스, 및 조이스틱(joystick) 등과 같은 입력 장치와, LCD(Liquid Crystal Display), LED(Light Emitting Diode), 유기 발광 다이오드(Organic LED, OLED), 능동형 유기 발광 다이오드(Active Matrix OLED, AMOLED), 프린터(printer), 플로터(plotter) 등과 같은 출력 장치 중 하나 이상을 포함하여 구성될 수 있으며, 이에 한정되지 않는다.The input/output device 360 includes input devices such as buttons, switches, keyboards, mice, and joysticks, LCD (Liquid Crystal Display), LED (Light Emitting Diode), organic light emitting diode (OLED), and active type. It may be configured to include one or more output devices such as an organic light emitting diode (Active Matrix OLED, AMOLED), a printer, a plotter, etc., but is not limited thereto.

본 명세서에 포함된 실시 예가 소프트웨어로 구현될 경우, 상술한 방법은 상술한 기능을 제각각 수행하는 모듈(과정, 기능 등)들로 구현될 수 있다. 각각의 모듈은 메모리(355)에 상주되고 프로세서(350)에 의해 실행될 수 있다. 메모리(355)는 프로세서(350)의 내부 또는 외부에 존재할 수 있고, 잘 알려진 다양한 수단으로 프로세서(350)와 연결될 수 있다.When the embodiments included in this specification are implemented as software, the above-described method may be implemented as modules (processes, functions, etc.) that respectively perform the above-described functions. Each module resides in memory 355 and can be executed by processor 350. Memory 355 may exist inside or outside of processor 350 and may be connected to processor 350 by various well-known means.

도 15에 도시된 각 구성 요소는 다양한 수단(예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등)에 의해 구현될 수 있다. 하드웨어에 의해 구현될 경우, 본 발명의 일 실시예는 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 콘트롤러, 마이크로 콘트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.Each component shown in FIG. 15 may be implemented by various means (eg, hardware, firmware, software, or a combination thereof). When implemented by hardware, an embodiment of the present invention includes one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), and FPGAs ( Field Programmable Gate Arrays), processor, controller, microcontroller, microprocessor, etc.

또한, 펌웨어나 소프트웨어에 의해 구현될 경우, 본 발명의 일 실시예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현되어, 다양한 컴퓨터 수단을 통하여 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. In addition, when implemented by firmware or software, an embodiment of the present invention is implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above, and is stored on a recording medium readable through various computer means. can be recorded Here, the recording medium may include program instructions, data files, data structures, etc., singly or in combination.

기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 업계의 통상의 지식을 가진 자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. Program instructions recorded on the recording medium may be specially designed and configured for the present invention or may be known and available to those skilled in the computer software industry. For example, recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROM (Compact Disk Read Only Memory) and DVD (Digital Video Disk), and floptical media. It includes magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, etc.

프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 이러한, 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions may include machine language code such as that created by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. Such hardware devices may be configured to operate as one or more software to perform the operations of the present invention, and vice versa.

이하, 상술한 바와 같은 데이터 분석 장치(300)의 동작에 대하여 보다 구체적으로 설명하기로 한다.Hereinafter, the operation of the data analysis device 300 as described above will be described in more detail.

도 16은 본 발명의 일 실시예에 따른 데이터 분석 방법을 설명하기 위한 순서도이다.Figure 16 is a flowchart for explaining a data analysis method according to an embodiment of the present invention.

도 16을 참조하면, 본 발명의 일 실시예에 따른 데이터 분석 장치(300)는 관계형 데이터베이스에 관한 데이터를 수집할 수 있다(S100). Referring to FIG. 16, the data analysis device 300 according to an embodiment of the present invention can collect data about a relational database (S100).

구체적으로, 데이터 분석 장치(300)는 구조화된 질의 언어(SQL)에 따른 질의를 서비스 제공 장치(200)에 전송한 한 후 이에 대한 응답으로 데이터를 수집하거나, 서비스 제공 장치(200)의 주체와의 계약에 기반하여 일방적으로 데이터를 수집할 수도 있다. 또한, 데이터 분석 장치(300)는 서비스 제공 장치(200)의 주체와의 계약에 기반하여 저장장치로부터 데이터를 입력 받을 수도 있다.Specifically, the data analysis device 300 transmits a query according to structured query language (SQL) to the service provision device 200 and then collects data in response, or communicates with the subject of the service provision device 200. Data may be collected unilaterally based on a contract. Additionally, the data analysis device 300 may receive data from a storage device based on a contract with the subject of the service providing device 200.

다음으로, 데이터 분석 장치(300)는 관계형 데이터베이스에 포함된 데이터의 전부 또는 일부를 그래프 데이터베이스로 변환할 수 있다(S200).Next, the data analysis device 300 may convert all or part of the data included in the relational database into a graph database (S200).

구체적으로, 데이터 분석 장치(300)는 관계형 데이터베이스에 포함된 전부 또는 일부의 엔티티들에 대응하여, 그래프 데이터베이스를 구성할 노드를 개별적으로 생성할 수 있다. 데이터 분석 장치(300)는 관계형 데이터베이스에 포함된 전부 또는 일부의 엔티티들 사이의 릴레이션에 대응하여, 그래프 데이터베이스를 구성할 엣지를 개별적으로 생성할 수 있다. 데이터 분석 장치(300)는 관계형 데이터베이스에 포함된 전부 또는 일부의 어트리뷰트에 대응하여, 그래프 데이터베이스를 구성할 노드 및 엣지 중 하나 이상에게 속성을 부여할 수 있다. 그리고, 데이터 분석 장치(300)는 개별적으로 생성된 노드, 엣지 및 속성을 배치 처리함으로써, 그래프 데이터베이스를 생성할 수 있다.Specifically, the data analysis device 300 may individually create nodes that will form a graph database in response to all or part of the entities included in the relational database. The data analysis device 300 may individually generate edges that will form a graph database in response to relationships between all or part of the entities included in the relational database. The data analysis device 300 may assign properties to one or more of the nodes and edges that constitute the graph database in response to all or part of the attributes included in the relational database. Additionally, the data analysis device 300 can create a graph database by batch processing individually created nodes, edges, and attributes.

다음으로, 데이터 분석 장치(300)는 그래프 데이터베이스에 포함된 전부 또는 일부의 데이터를 그래프 형태로 시각화할 수 있다(S300).Next, the data analysis device 300 may visualize all or part of the data included in the graph database in graph form (S300).

구체적으로, 데이터 분석 장치(300)는 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성의 전부 또는 일부에 대응하여 그래프를 생성할 수 있다. 그리고, 데이터 분석 장치(300)는 생성된 그래프의 구조를 이차원 이미지로 표현할 수 있다. 일 실시예에 따르면, 데이터 분석 장치(300)에 의해 이차원 이미지로 표현된 그래프는 모든 노드 및 엣지가 동일한 크기, 굵기 및 색상을 가질 수 있다. 그러나, 이에 한정되지 아니하고, 데이터 분석 장치(300)는 그래프의 노드 또는 엣지가 다양한 크기, 굵기 또는 색상을 가지는 이차원 이미지를 표현할 수도 있다.Specifically, the data analysis device 300 may generate a graph corresponding to all or part of the nodes, edges, and properties that make up the graph database. Additionally, the data analysis device 300 can express the structure of the generated graph as a two-dimensional image. According to one embodiment, all nodes and edges of a graph expressed as a two-dimensional image by the data analysis device 300 may have the same size, thickness, and color. However, the data analysis device 300 is not limited to this and may represent a two-dimensional image in which nodes or edges of a graph have various sizes, thicknesses, or colors.

다음으로, 데이터 분석 장치(300)는 시계열적으로 변화되는 데이터의 관계성을 추적할 수 있는 사용자 인터페이스(UI)를 제공할 수 있다(S400).Next, the data analysis device 300 may provide a user interface (UI) that can track relationships between data that change over time (S400).

구체적으로, 데이터 분석 장치(300)는 시각별로 각각 생성된 그래프 구조의 이차원 이미지를 기초로, 그래프의 변화 과정을 시계열적으로 탐색할 수 있는 사용자 인터페이스(UI)를 생성할 수 있다. 그리고, 데이터 분석 장치(300)는 생성된 사용자 인터페이스를 포함하는 데이터를 전송하거나 출력할 수 있다.Specifically, the data analysis device 300 may generate a user interface (UI) that allows time-series exploration of the change process of the graph based on two-dimensional images of the graph structure generated for each time. Additionally, the data analysis device 300 may transmit or output data including the generated user interface.

특히, 본 발명의 일 실시예에 따른 사용자 인터페이스(UI)에는 그래프 구조에 특별한 변화가 발생된 시점을 지시하는 그래픽이 더 포함될 수 있다. 이를 위하여, 데이터 분석 장치(300)는 그래프 데이터베이스를 구성하는 노드, 엣지 및 속성이 개별적으로 생성, 삭제 또는 수정된 경우들 중에서 사전에 설정된 기준을 만족하는 이벤트를 식별할 수 있다. 그리고, 데이터 분석 장치(300)는 사용자 인터페이스 상의 식별된 이벤트가 발생된 시각에 대응되는 위치에 이벤트가 발생됨을 지시하는 그래픽을 추가할 수 있다. In particular, the user interface (UI) according to an embodiment of the present invention may further include graphics indicating when a special change occurs in the graph structure. To this end, the data analysis device 300 may identify events that satisfy preset criteria among cases where nodes, edges, and attributes constituting the graph database are individually created, deleted, or modified. Additionally, the data analysis device 300 may add graphics indicating that an event occurs at a location corresponding to the time at which the identified event occurred on the user interface.

다음으로, 데이터 분석 장치(300)는 그래프를 이상 징후 탐지를 위해 학습된 인공지능(AI) 모델에 입력한 후, 인공지능(AI) 모델로부터 출력된 결과 값을 기초로 이상 징후의 유무를 판단할 수 있다(S500).Next, the data analysis device 300 inputs the graph into an artificial intelligence (AI) model learned to detect anomalies and then determines the presence or absence of anomalies based on the result value output from the artificial intelligence (AI) model. You can do it (S500).

구체적으로, 데이터 분석 장치(300)는 이차원 이미지를 구성하는 데이터를 일렬로 나열하여 하나의 시퀀스로 변환하여 인코더에 입력할 수 있다. 데이터 분석 장치(300)는 인코더에서 셀프 어텐션을 적용하여 시퀀스 내의 데이터 위치가 서로 연결된 하나의 벡터로 변환한 후, 변환된 벡터를 디코더에 입력할 수 있다. 데이터 분석 장치(300)는 디코더에서 헝가리안 알고리즘에 기반한 손실 함수를 사용하여 결과 값을 출력한다. 그리고, 데이터 분석 장치(300)는 디코더로부터 출력된 결과 값을 FFN에 입력하여, 이상 징후에 대응하는 공간적 패턴을 예측할 수 있다.Specifically, the data analysis device 300 can line up the data constituting the two-dimensional image, convert it into one sequence, and input it to the encoder. The data analysis device 300 may apply self-attention in the encoder to convert data positions in the sequence into a single vector connected to each other, and then input the converted vector to the decoder. The data analysis device 300 outputs a result value using a loss function based on the Hungarian algorithm in the decoder. Then, the data analysis device 300 can input the result value output from the decoder into the FFN to predict a spatial pattern corresponding to the abnormality symptom.

특히, 본 발명의 일 실시예에 따른 합성곱신경망(CNN)의 인코더는 입력 특징 맵에 콘볼루션 연산을 수행하여 출력 특징 맵을 생성함에 있어, 특징을 추출하는 영역인 콘볼루션 필터 크기에 학습 가능한 오프셋을 반영하여 콘볼루션 연산을 수행하도록 변형되었다. 또한, 합성곱신경망(CNN)의 인코더는 셀프 어텐션을 수행함에 있어, 콘볼루션 필터의 크기에 반영된 오프셋을 어텐션의 키로 사용하도록 변형될 수 있다.In particular, the encoder of a convolutional neural network (CNN) according to an embodiment of the present invention generates an output feature map by performing a convolution operation on the input feature map, and can learn the size of the convolution filter, which is the area from which features are extracted. It was modified to perform a convolution operation by reflecting the offset. Additionally, the encoder of a convolutional neural network (CNN) can be modified to use the offset reflected in the size of the convolutional filter as an attention key when performing self-attention.

지금까지 서술한 본 발명의 일 실시 예에 따르면, 관계형 데이터베이스에 포함된 데이터를 그래프 데이터베이스로 변환하고 그래프 형태로 시각화하여 제공함으로써, 데이터의 복잡한 관계성을 직관적으로 인지할 수 있게 된다. 특히, 시계열적으로 변화되는 데이터의 관계를 용이하게 탐색할 수 있는 사용자 인터페이스(UI)를 제공함으로써, 데이터 다양한 변화에 유연하게 대처할 수 있게 된다.According to an embodiment of the present invention described so far, data included in a relational database is converted into a graph database and visualized in graph form, thereby making it possible to intuitively recognize the complex relationships of the data. In particular, by providing a user interface (UI) that can easily explore relationships between data that change over time, it is possible to flexibly cope with various changes in data.

또한, 본 발명의 다른 실시 예에 따르면, 그래프 형태로 시각화된 데이터를 이용하여 이상 징후를 탐지함으로써, 기존 테이블 형식 또는 매트릭스 형식의 데이터베이스 기반으로는 탐지하기 어려웠던 복잡한 관계성을 기반으로 성립되는 이상 징후까지도 탐지할 수 있게 된다. 특히, 콘볼루션 연산을 수행함에 있어 특징을 추출하는 그리드 영역을 확장하기 위해 적용되는 오프셋을 인코더의 어텐션의 키로 활용함으로써, 종래의 합성곱신경망(CNN)에 의해 구현된 인공지능(AI)으로 검출이 어려웠던, 노드와 엣지가 좁은 영역에 밀집된 형태의 이상 징후까지도 검출할 수 있게 된다.In addition, according to another embodiment of the present invention, by detecting abnormalities using data visualized in the form of a graph, abnormalities are established based on complex relationships that were difficult to detect based on existing databases in table format or matrix format. It can even be detected. In particular, by using the offset applied to expand the grid area from which features are extracted when performing the convolution operation as the attention key of the encoder, detection is performed with artificial intelligence (AI) implemented by a conventional convolutional neural network (CNN). It is now possible to detect abnormalities in which nodes and edges are concentrated in a narrow area, which was difficult.

이상과 같이, 본 명세서와 도면에는 본 발명의 바람직한 실시예에 대하여 개시하였으나, 여기에 개시된 실시예 외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다. 또한, 본 명세서와 도면에서 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다. 따라서, 상술한 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적해석에 의해 선정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.As described above, although preferred embodiments of the present invention have been disclosed in the specification and drawings, it is known in the technical field to which the present invention belongs that other modifications based on the technical idea of the present invention are possible in addition to the embodiments disclosed herein. It is self-evident to those with ordinary knowledge. In addition, although specific terms are used in the specification and drawings, they are merely used in a general sense to easily explain the technical content of the present invention and aid understanding of the invention, and are not intended to limit the scope of the present invention. Accordingly, the above detailed description should not be construed as restrictive in all respects and should be considered illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

100 : 사용자 단말기 200 : 서비스 제공 장치
300 : 데이터 분석 장치 305 : 통신부
310 : 입출력부 315 : 데이터베이스 변환부
320 : 그래프 모델링부 325 : 데이터 관계 추적부
330 : 이상 징후 탐지부100: User terminal 200: Service provision device
300: data analysis device 305: communication department
310: input/output unit 315: database conversion unit
320: Graph modeling unit 325: Data relationship tracking unit
330: Anomaly detection unit

Claims

In an anomaly detection method performed by a computing device,
Converting all or part of data included in a relational database into a graph database;
generating a graph corresponding to all or part of nodes, edges, and properties constituting the converted graph database; and
After inputting the graph into an artificial intelligence (AI) model learned for fraud detection, the presence or absence of abnormalities is determined based on the results output from the artificial intelligence (AI) model. Includes steps,
The step of determining the presence or absence of the above abnormalities is
Input a two-dimensional image expressing the structure of the graph into an artificial intelligence (AI) model implemented as a convolutional neural network (CNN) with a transformer encoder-decoder structure,
The encoder is
In performing a convolution operation on the input feature map to create an output feature map, a learnable offset ( An anomaly detection method characterized by extracting features from a grid area wider than the size of the convolution filter by performing the convolution operation by reflecting the offset.

delete

The method of claim 1, wherein the step of determining the presence or absence of the abnormality symptom is
After changing the resolution of the two-dimensional image to create an image group consisting of a plurality of two-dimensional images with different resolutions, all two-dimensional images included in the image group are generated by artificial intelligence (AI) implemented with the convolutional neural network (CNN). An anomaly detection method characterized by determining the presence or absence of the anomaly by combining a plurality of result values output from the artificial intelligence (AI) model after each input to the artificial intelligence (AI) model.

The method of claim 1, wherein the step of determining the presence or absence of the abnormality symptom is
After creating an image group consisting of two-dimensional images for a plurality of subgraphs with only some of the nodes, edges, and properties constituting the graph, all two-dimensional images included in the image group are processed by the convolutional neural network (CNN). An anomaly detection method characterized by determining the presence or absence of the anomaly by combining a plurality of result values output from the artificial intelligence (AI) model after inputting each into an implemented artificial intelligence (AI) model.

The method of claim 1, wherein the step of determining the presence or absence of the abnormality symptom is
The data constituting the two-dimensional image are arranged in a row, converted into a sequence, and input to an encoder. The encoder applies self-attention so that the data positions in the sequence are connected to one vector. After converting to a vector, the converted vector is input to a decoder, and the decoder uses a loss function based on the Hungarian algorithm to predict the spatial pattern corresponding to the anomaly. An anomaly detection method.

delete

The method of claim 5, wherein the encoder
An anomaly detection method characterized in that, when performing the self-attention, an attention key is used as the offset.

The method of claim 1, wherein generating the graph includes
The size of the node to be represented by the two-dimensional image is individually set in response to the properties of the nodes constituting the graph, and the edge to be represented by the two-dimensional image is set in correspondence to the properties of the edge constituting the graph. An anomaly detection method, characterized in that the thickness is individually set.

The method of claim 1, wherein generating the graph includes
The color of the node to be represented by the two-dimensional image is individually set in response to the properties of the nodes constituting the graph, and the color of the edge to be represented by the two-dimensional image is set in correspondence to the properties of the edge constituting the graph. An anomaly detection method characterized by individually setting colors.

memory; and
Combined with a computing device configured to include a processor that processes instructions resident in the memory,
converting, by the processor, all or part of data included in a relational database into a graph database;
generating, by the processor, a graph corresponding to all or part of nodes, edges, and attributes constituting the converted graph database; and
The processor inputs the graph into an artificial intelligence (AI) model learned to detect anomalies, and then executes a step of determining the presence or absence of anomalies based on the results output from the artificial intelligence (AI) model. It is recorded on a recording medium in order to do so,
The step of determining the presence or absence of the above abnormalities is
Input a two-dimensional image expressing the structure of the graph into an artificial intelligence (AI) model implemented as a convolutional neural network (CNN) with a transformer encoder-decoder structure,
The encoder is
When performing a convolution operation on the input feature map to generate an output feature map, the learnable offset is reflected in the size of the convolution filter, which is the area from which features are extracted, and the convolution operation is performed to create an output feature map that is wider than the size of the convolution filter. A computer program characterized by extracting features from a grid area of extent.