KR20230155943A

KR20230155943A - Method for embedding graph and system thereof

Info

Publication number: KR20230155943A
Application number: KR1020220133768A
Authority: KR
Inventors: 신재선; 권영준
Original assignee: 삼성에스디에스 주식회사
Priority date: 2022-05-04
Filing date: 2022-10-18
Publication date: 2023-11-13

Abstract

그래프 임베딩 방법 및 그 시스템이 제공된다. 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법은, 타깃 그래프에 대한 제1 임베딩 표현(embedding representation)과 제2 임베딩 표현을 획득하는 단계, 제2 임베딩 표현에 특정값을 반영하여 제2 임베딩 표현의 값을 변경하는 단계 및 제1 임베딩 표현과 변경된 제2 임베딩 표현을 애그리게이팅(aggregating)하여 타깃 그래프에 대한 통합 임베딩 표현을 생성하는 단계를 포함할 수 있다. 이러한 방법에 따르면, 정보(또는 표현력)의 손실 없이 다양한 임베딩 표현들을 통합될 수 있으며, 그 결과 타깃 그래프에 대한 보다 강력한 임베딩 표현이 생성될 수 있다.A graph embedding method and system are provided. A graph embedding method according to some embodiments of the present disclosure includes obtaining a first embedding representation and a second embedding representation for a target graph, and reflecting a specific value in the second embedding representation to form the second embedding representation. It may include changing the value of and generating an integrated embedding expression for the target graph by aggregating the first embedding expression and the changed second embedding expression. According to this method, various embedding representations can be integrated without loss of information (or expressive power), resulting in a more powerful embedding representation for the target graph.

Description

Graph embedding method and system {METHOD FOR EMBEDDING GRAPH AND SYSTEM THEREOF}

본 개시는 그래프 임베딩(graph embedding) 방법 및 그 시스템에 관한 것이다.This disclosure relates to a graph embedding method and system.

그래프 임베딩(graph embedding)은 주어진 그래프를 임베딩 공간 상의 벡터 또는 매트릭스 표현(representation)으로 변환하는 것을 의미한다. 최근에는 신경망(neural network)을 이용하여 그래프를 임베딩하는 방법에 관한 연구가 활발하게 진행되고 있으며, 그래프를 다루는 신경망은 GNN(Graph Neural Network)으로 칭해지고 있다.Graph embedding means converting a given graph into a vector or matrix representation in the embedding space. Recently, research on methods for embedding graphs using neural networks has been actively conducted, and neural networks that deal with graphs are called GNNs (Graph Neural Networks).

한편, 서로 다른 GNN들(e.g., 상이한 구조의 GNN들, 서로 다른 방식으로 동작하는 GNN)은 동일한 그래프에 대해 서로 다른 임베딩 벡터를 생성하게 된다. 따라서, 이러한 임베딩 벡터들을 정보(또는 표현력)의 손실 없이 통합시킬 수 있다면, 보다 강력한 임베딩 표현이 생성될 수 있다.Meanwhile, different GNNs (e.g., GNNs with different structures, GNNs operating in different ways) generate different embedding vectors for the same graph. Therefore, if these embedding vectors can be integrated without loss of information (or expressive power), a more powerful embedding representation can be created.

통합 과정 중의 정보 손실 방지하기 위해, 임베딩 벡터들을 그대로 연결(concatenation)하여 통합 임베딩 벡터를 생성하는 것을 고려해볼 수 있다. 그러나, 이러한 방식에는 연결되는 임베딩 벡터들의 개수에 비례하여 통합 임베딩 벡터의 차원수와 연관 태스크(e.g., 분류, 회귀 등의 다운스트림 태스크)의 복잡도가 증가한다는 명백한 문제점이 존재한다.To prevent information loss during the integration process, you can consider creating an integrated embedding vector by concatenating the embedding vectors as is. However, this method has an obvious problem in that the number of dimensions of the integrated embedding vector and the complexity of the associated task (e.g., downstream tasks such as classification and regression) increase in proportion to the number of connected embedding vectors.

한국공개특허 제10-2022-0032730호 (2022.03.15 공개)Korean Patent Publication No. 10-2022-0032730 (published on March 15, 2022)

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 정보(또는 표현력)의 손실 없이 그래프에 대한 다양한 임베딩 표현들을 통합시킬 수 있는 방법 및 그 방법을 수행하는 시스템을 제공하는 것이다.The technical problem to be solved through some embodiments of the present disclosure is to provide a method that can integrate various embedding representations of a graph without loss of information (or expressive power) and a system that performs the method.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 연관 태스크의 복잡도 증가 없이 그래프에 대한 다양한 임베딩 표현들을 통합시킬 수 있는 방법 및 그 방법을 수행하는 시스템을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a method that can integrate various embedding representations for a graph without increasing the complexity of related tasks and a system that performs the method.

본 개시의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 그래프의 노드 정보와 토폴로지 정보를 함께 임베딩할 수 있는 방법 및 그 방법을 수행하는 시스템을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a method for embedding node information and topology information of a graph together and a system for performing the method.

본 개시의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법은, 적어도 하나의 컴퓨팅 장치에 의해 수행되는 방법에 있어서, 타깃 그래프에 대한 제1 임베딩 표현(embedding representation)과 제2 임베딩 표현을 획득하는 단계, 상기 제2 임베딩 표현에 특정값을 반영하여 상기 제2 임베딩 표현의 값을 변경하는 단계 및 상기 제1 임베딩 표현과 상기 변경된 제2 임베딩 표현을 애그리게이팅(aggregating)하여 상기 타깃 그래프에 대한 통합 임베딩 표현을 생성하는 단계를 포함할 수 있다.A graph embedding method according to some embodiments of the present disclosure for solving the above-described technical problem is a method performed by at least one computing device, including a first embedding representation and a second embedding representation for the target graph. Obtaining an embedding expression, changing a value of the second embedding expression by reflecting a specific value in the second embedding expression, and aggregating the first embedding expression and the changed second embedding expression. It may include generating a unified embedding representation for the target graph.

몇몇 실시예들에서, 상기 제1 임베딩 표현과 상기 제2 임베딩 표현 중 어느 하나는 상기 타깃 그래프를 구성하는 이웃 노드들의 정보를 애그리게이팅하는 임베딩 방식에 의해 생성된 것이고, 상기 제1 임베딩 표현과 상기 제2 임베딩 표현 중 다른 하나는 상기 타깃 그래프의 토폴로지 정보를 반영하는 임베딩 방식에 의해 생성된 것일 수 있다.In some embodiments, one of the first embedding expression and the second embedding expression is generated by an embedding method that aggregates information of neighboring nodes constituting the target graph, and the first embedding expression and the second embedding expression are generated by an embedding method that aggregates information of neighboring nodes constituting the target graph. Another one of the second embedding expressions may be generated by an embedding method that reflects topology information of the target graph.

몇몇 실시예들에서, 상기 제1 임베딩 표현과 상기 제2 임베딩 표현은 서로 다른 GNN(Graph Neural Network)을 통해 상기 타깃 그래프를 임베딩함으로써 생성된 것일 수 있다.In some embodiments, the first embedding expression and the second embedding expression may be generated by embedding the target graph through different graph neural networks (GNNs).

몇몇 실시예들에서, 상기 특정값은 무리수일 수 있다.In some embodiments, the specific value may be an irrational number.

몇몇 실시예들에서, 상기 특정값은 학습가능 파라미터(learnable parameter)에 기반한 값이고, 상기 생성된 통합 임베딩 표현에 기초하여 미리 정해진 태스크에 관한 레이블을 예측하는 단계 및 상기 예측된 레이블과 상기 타깃 그래프에 대한 정답 레이블의 차이에 기초하여 상기 학습가능 파라미터의 값을 업데이트하는 단계를 더 포함할 수 있다.In some embodiments, the specific value is a value based on a learnable parameter, and predicting a label for a predetermined task based on the generated integrated embedding representation and the predicted label and the target graph. It may further include updating the value of the learnable parameter based on the difference between the correct answer labels.

몇몇 실시예들에서, 상기 제2 임베딩 표현에 상기 특정값을 반영하는 것은 곱셈 연산에 기초하여 수행되고, 상기 제1 임베딩 표현과 상기 변경된 제2 임베딩 표현을 애그리게이팅하는 것은 덧셈 연산에 기초하여 수행될 수 있다.In some embodiments, reflecting the specific value in the second embedding representation is performed based on a multiplication operation, and aggregating the first embedding representation and the modified second embedding representation is performed based on an addition operation. It can be.

몇몇 실시예들에서, 상기 제1 임베딩 표현과 상기 제2 임베딩 표현을 획득하는 단계는, 상기 타깃 그래프에 대한 제1 임베딩 매트릭스와 제2 임베딩 매트릭스를 획득하는 단계 - 상기 제1 임베딩 매트릭스의 사이즈는 상기 제2 임베딩 매트릭스와 상이함 - 및 상기 제1 임베딩 매트릭스와 상기 제2 임베딩 매트릭스 중 적어도 하나에 대해 리사이징(resizing) 연산을 수행함으로써 상기 제1 임베딩 표현과 상기 제2 임베딩 표현을 획득하는 단계를 포함할 수 있다.In some embodiments, obtaining the first embedding representation and the second embedding representation may include obtaining a first embedding matrix and a second embedding matrix for the target graph, wherein the size of the first embedding matrix is different from the second embedding matrix - and obtaining the first embedding representation and the second embedding representation by performing a resizing operation on at least one of the first embedding matrix and the second embedding matrix. It can be included.

몇몇 실시예들에서, 상기 제1 임베딩 표현과 제2 임베딩 표현을 획득하는 단계는, 이웃 노드 정보 애그리게이팅 방식의 GNN(Graph Neural Network)을 통해 상기 제1 임베딩 표현을 획득하는 단계 및 상기 제1 임베딩 표현을 이용하여 상기 타깃 그래프의 토폴로지 정보를 추출함으로써 상기 제2 임베딩 표현을 생성하는 단계를 포함할 수 있다.In some embodiments, obtaining the first embedding representation and the second embedding representation includes obtaining the first embedding representation through a graph neural network (GNN) of a neighboring node information aggregating method, and the first embedding representation It may include generating the second embedding representation by extracting topology information of the target graph using the embedding representation.

몇몇 실시예들에서, 상기 통합 임베딩 표현을 생성하는 단계는, 상기 제1 임베딩 표현과 상기 변경된 제2 임베딩 표현 각각에 대해 풀링 연산을 수행하는 단계 및 상기 풀링 연산의 결과를 애그리게이팅하여 상기 통합 임베딩 표현을 생성하는 단계를 포함할 수 있다.In some embodiments, generating the unified embedding representation may include performing a pooling operation on each of the first embedding representation and the changed second embedding representation, and aggregating the results of the pooling operation to generate the unified embedding. It may include the step of generating an expression.

몇몇 실시예들에서, 상기 통합 임베딩 표현을 생성하는 단계는, 상기 타깃 그래프에 대한 제3 임베딩 표현을 획득하는 단계, 상기 제3 임베딩 표현에 상기 특정값과 다른 값을 반영하여 상기 제3 임베딩 표현의 값을 변경하는 단계 및 상기 제1 임베딩 표현, 상기 변경된 제2 임베딩 표현 및 상기 변경된 제3 임베딩 표현을 애그리게이팅하여 상기 통합 임베딩 표현을 생성하는 단계를 포함할 수 있다.In some embodiments, generating the integrated embedding representation includes obtaining a third embedding representation for the target graph, and reflecting a value different from the specific value in the third embedding representation to express the third embedding representation. It may include changing the value of and generating the integrated embedding representation by aggregating the first embedding expression, the changed second embedding expression, and the changed third embedding expression.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 시스템은, 하나 이상의 프로세서 및 하나 이상의 인스트럭션(instruction)을 저장하는 메모리를 포함하고, 상기 하나 이상의 프로세서는, 상기 저장된 하나 이상의 인스트럭션을 실행시킴으로써, 타깃 그래프에 대한 제1 임베딩 표현(embedding representation)과 제2 임베딩 표현을 획득하는 동작, 상기 제2 임베딩 표현에 특정값을 반영하여 상기 제2 임베딩 표현의 값을 변경하는 동작 및 상기 제1 임베딩 표현과 상기 변경된 제2 임베딩 표현을 애그리게이팅(aggregating)하여 상기 타깃 그래프에 대한 통합 임베딩 표현을 생성하는 동작을 수행할 수 있다.A graph embedding system according to some embodiments of the present disclosure for solving the above-described technical problem includes one or more processors and a memory that stores one or more instructions, wherein the one or more processors store the one or more instructions. An operation of obtaining a first embedding representation and a second embedding representation for a target graph by executing an instruction, an operation of changing a value of the second embedding representation by reflecting a specific value in the second embedding representation, and An operation of generating an integrated embedding representation for the target graph may be performed by aggregating the first embedding expression and the changed second embedding expression.

상술한 기술적 과제를 해결하기 위한 본 개시의 몇몇 실시예들에 따른 컴퓨터 프로그램은, 컴퓨팅 장치와 결합되어, 타깃 그래프에 대한 제1 임베딩 표현(embedding representation)과 제2 임베딩 표현을 획득하는 단계, 상기 제2 임베딩 표현에 특정값을 반영하여 상기 제2 임베딩 표현의 값을 변경하는 단계 및 상기 제1 임베딩 표현과 상기 변경된 제2 임베딩 표현을 애그리게이팅(aggregating)하여 상기 타깃 그래프에 대한 통합 임베딩 표현을 생성하는 단계를 실행시키기 위하여 컴퓨터로 판독가능한 기록매체에 저장될 수 있다.A computer program according to some embodiments of the present disclosure for solving the above-described technical problem includes the steps of combining with a computing device to obtain a first embedding representation and a second embedding representation for a target graph, changing the value of the second embedding expression by reflecting a specific value in the second embedding expression, and aggregating the first embedding expression and the changed second embedding expression to create a unified embedding expression for the target graph. It may be stored in a computer-readable recording medium in order to execute the generating step.

본 개시의 몇몇 실시예들에 따르면, 타깃 그래프에 대한 다양한 임베딩 표현들 중 적어도 일부에 특정값을 반영하고 애그리게이팅(aggregating)함으로써 타깃 그래프에 대한 통합 임베딩 표현(embedding representation)이 생성될 수 있다. 이러한 경우, 애그리게이팅 과정 중에 서로 다른 임베딩 표현의 값들이 상쇄되는 것이 특정값에 의해 방지될 수 있기 때문에, 정보(또는 표현력)의 손실 없이 다양한 임베딩 표현들이 통합될 수 있다. 가령, 특정 임베딩 표현에 무리수를 곱셈함으로써 애그리게이팅(e.g., 덧셈) 과정 중에 서로 다른 임베딩 표현의 값들이 상쇄되는 것이 효과적으로 방지될 수 있다.According to some embodiments of the present disclosure, an integrated embedding representation for the target graph can be generated by reflecting and aggregating a specific value in at least some of the various embedding representations for the target graph. In this case, since the values of different embedding expressions can be prevented from being offset by a specific value during the aggregating process, various embedding expressions can be integrated without loss of information (or expressiveness). For example, by multiplying a specific embedding expression by an irrational number, the values of different embedding expressions can be effectively prevented from being canceled out during the aggregating (e.g., addition) process.

또한, 특정값이 학습가능 파라미터(learnable parameter)에 기초하여 도출될 수 있다. 이러한 경우, 통합 임베딩 표현을 생성하기 위한 학습이 진행됨에 따라 임베딩 표현들의 정보(또는 표현력) 손실을 방지할 수 있는 최적의 값이 자연스럽고도 정확하게 도출될 수 있다.Additionally, a specific value may be derived based on a learnable parameter. In this case, as learning to generate an integrated embedding expression progresses, the optimal value that can prevent loss of information (or expressive power) of the embedding expressions can be naturally and accurately derived.

또한, 타깃 그래프에 대한 다양한 임베딩 표현(e.g., 임베딩 벡터)들이 덧셈 연산을 통해 애그리게이팅될 수 있다. 이러한 경우, 임베딩 표현들의 개수가 증가하더라도 통합 임베딩 표현의 사이즈(e.g., 통합 임베딩 벡터의 차원수)는 증가하지 않기 때문에, 연관 태스크(e.g., 분류, 회귀 등의 다운스트림 태스크)의 복잡도 증가 문제가 용이하게 해결될 수 있다.Additionally, various embedding expressions (e.g., embedding vectors) for the target graph can be aggregated through addition operations. In this case, even if the number of embedding expressions increases, the size of the integrated embedding expression (e.g., the number of dimensions of the integrated embedding vector) does not increase, so there is a problem of increasing complexity of related tasks (e.g., downstream tasks such as classification and regression). It can be easily solved.

또한, 멀티-레이어 퍼셉트론(multi-layer perceptron) 등에 기초하여 구현된 리사이징(resizing) 연산을 통해 사이즈가 서로 다른 임베딩 표현들(e.g., 임베딩 매트릭스들)도 용이하게 통합될 수 있다.In addition, embedding representations (e.g., embedding matrices) of different sizes can be easily integrated through a resizing operation implemented based on a multi-layer perceptron, etc.

또한, 노드 정보 중심의 임베딩 표현과 토폴로지 중심의 임베딩 표현을 통합시킴으로써 타깃 그래프에 대한 강력한 통합 임베딩 표현이 생성될 수 있으며, 통합 임베딩 표현을 이용함으로써 그래프와 연관된 다양한 태스크들(e.g., 분류, 회귀 등의 다양한 다운스트림 태스크들)의 정확도도 크게 향상될 수 있다.In addition, by integrating the node information-centered embedding expression and the topology-centered embedding expression, a powerful unified embedding expression for the target graph can be created, and by using the unified embedding expression, various tasks related to the graph (e.g., classification, regression, etc.) can be created. The accuracy of various downstream tasks) can also be greatly improved.

또한, 이웃 노드 정보 애그리게이팅 모듈을 통해 출력된 제1 임베딩 표현에서 그래프의 토폴로지 정보를 추출함으로써 제2 임베딩 표현이 생성될 수 있다. 이러한 경우, 이웃 노드 정보 애그리게이팅 모듈이 일종의 공유 신경망으로 기능함으로써 통합 임베딩 표현을 생성(출력)하는 통합 GNN이 용이하게 구축될 수 있다.Additionally, a second embedding expression may be generated by extracting topology information of the graph from the first embedding expression output through the neighbor node information aggregating module. In this case, an integrated GNN that generates (outputs) an integrated embedding representation can be easily constructed by having the neighboring node information aggregating module function as a kind of shared neural network.

본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below.

도 1 및 도 2는 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 시스템의 동작을 개략적으로 설명하기 위한 예시적인 도면이다.
도 3 및 도 4는 본 개시의 몇몇 실시예들에서 참조될 수 있는 이웃 노드 정보 애그리게이팅 방식의 GNN(Graph Neural Network)을 설명하기 위한 예시적인 도면이다.
도 5 및 도 6은 본 개시의 몇몇 실시예들에서 참조될 수 있는 토폴로지 정보 추출 방식의 GNN(Graph Neural Network)을 설명하기 위한 예시적인 도면이다.
도 7은 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법을 나타내는 예시적인 흐름도이다.
도 8은 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법을 부연 설명하기 위한 예시적인 도면이다.
도 9는 본 개시의 몇몇 실시예들에 따른 그래프 임베딩을 위한 학습 과정을 설명하기 위한 예시적인 도면이다.
도 10은 본 개시의 다른 몇몇 실시예들에 따른 그래프 임베딩 방법을 설명하기 위한 예시적인 도면이다.
도 11은 본 개시의 또 다른 몇몇 실시예들에 따른 그래프 임베딩 방법을 설명하기 위한 예시적인 도면이다.
도 12 및 도 13은 본 개시의 또 다른 몇몇 실시들에 따른 그래프 임베딩 방법을 설명하기 위한 예시적인 도면이다.
도 14는 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 시스템을 구현할 수 있는 예시적인 컴퓨팅 장치를 도시한다.1 and 2 are exemplary diagrams for schematically explaining the operation of a graph embedding system according to some embodiments of the present disclosure.
3 and 4 are exemplary diagrams for explaining a graph neural network (GNN) of a neighboring node information aggregating method that may be referred to in some embodiments of the present disclosure.
5 and 6 are exemplary diagrams for explaining a graph neural network (GNN) of a topology information extraction method that may be referred to in some embodiments of the present disclosure.
7 is an example flowchart illustrating a graph embedding method according to some embodiments of the present disclosure.
FIG. 8 is an exemplary diagram for further explaining a graph embedding method according to some embodiments of the present disclosure.
FIG. 9 is an exemplary diagram illustrating a learning process for graph embedding according to some embodiments of the present disclosure.
FIG. 10 is an exemplary diagram for explaining a graph embedding method according to some other embodiments of the present disclosure.
FIG. 11 is an exemplary diagram for explaining a graph embedding method according to some other embodiments of the present disclosure.
12 and 13 are exemplary diagrams for explaining a graph embedding method according to some other embodiments of the present disclosure.
14 illustrates an example computing device that can implement a graph embedding system according to some embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 다양한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 개시의 기술적 사상을 완전하도록 하고, 본 개시가 속한 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the attached drawings. The advantages and features of the present disclosure and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the technical idea of the present disclosure is not limited to the following embodiments and may be implemented in various different forms. The following examples are merely intended to complete the technical idea of the present disclosure and to cover the technical field to which the present disclosure belongs. is provided to fully inform those skilled in the art of the scope of the present disclosure, and the technical idea of the present disclosure is only defined by the scope of the claims.

본 개시의 다양한 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In describing various embodiments of the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.

다른 정의가 없다면, 이하의 실시예들에서 사용되는 용어(기술 및 과학적 용어를 포함)는 본 개시가 속한 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수도 있다. 본 개시에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시의 범주를 제한하고자 하는 것은 아니다.Unless otherwise defined, terms (including technical and scientific terms) used in the following embodiments may be used in a meaning that can be commonly understood by those skilled in the art in the technical field to which this disclosure pertains. It may vary depending on the intentions of engineers working in the related field, precedents, the emergence of new technologies, etc. The terminology used in this disclosure is for describing embodiments and is not intended to limit the scope of this disclosure.

이하의 실시예들에서 사용되는 단수의 표현은 문맥상 명백하게 단수인 것으로 특정되지 않는 한, 복수의 개념을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정되지 않는 한, 단수의 개념을 포함한다.The singular expressions used in the following embodiments include plural concepts, unless the context clearly specifies singularity. Additionally, plural expressions include singular concepts, unless the context clearly specifies plurality.

또한, 이하의 실시예들에서 사용되는 제1, 제2, A, B, (a), (b) 등의 용어는 어떤 구성요소를 다른 구성요소와 구별하기 위해 사용되는 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지는 않는다.In addition, terms such as first, second, A, B, (a), (b) used in the following embodiments are only used to distinguish one component from another component, and the terms The nature, sequence, or order of the relevant components are not limited.

이하, 첨부된 도면들을 참조하여 본 개시의 다양한 실시예들에 대하여 상세하게 설명한다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 시스템(10)의 동작을 개략적으로 설명하기 위한 예시적인 도면이다.FIG. 1 is an exemplary diagram schematically illustrating the operation of the graph embedding system 10 according to some embodiments of the present disclosure.

도 1에 도시된 바와 같이, 그래프 임베딩 시스템(10)은 그래프(13)에 대한 임베딩 표현(e.g., 16)을 생성할 수 있는 컴퓨팅 장치/시스템일 수 있다. 구체적으로, 그래프 임베딩 시스템(10)은 그래프(13)에 대한 다양한 임베딩 표현들(14, 15)을 통합하여 해당 그래프(13)에 대한 통합 임베딩 표현(16)을 생성할 수 있다. 도 1은 그래프 임베딩 시스템(10)이 2개의 GNN들(11, 12) 및/또는 임베딩 표현들(14, 15)을 이용(또는 통합)하는 경우를 예로써 도시하고 있으나, 이용(또는 통합)되는 GNN(Graph Neural Network) 및/또는 임베딩 표현(embedding representation)의 개수는 3개 이상이 될 수도 있다(도 11 참조). 이하에서는, 설명의 편의상, 그래프 임베딩 시스템(10)을 '임베딩 시스템(10)'으로 약칭하도록 한다.As shown in FIG. 1 , graph embedding system 10 may be a computing device/system capable of generating an embedding representation (e.g., 16 ) for a graph 13 . Specifically, the graph embedding system 10 may integrate various embedding representations 14 and 15 for the graph 13 to generate a unified embedding representation 16 for the graph 13. Figure 1 shows as an example a case where the graph embedding system 10 uses (or integrates) two GNNs 11 and 12 and/or embedding representations 14 and 15. The number of GNNs (Graph Neural Networks) and/or embedding representations may be three or more (see FIG. 11). Hereinafter, for convenience of explanation, the graph embedding system 10 will be abbreviated as 'embedding system 10'.

각각의 임베딩 표현(14, 15)은 그래프(13)에 대한 벡터 또는 매트릭스 형태의 표현을 의미할 수 있다. 이때, 매트릭스(e.g., 3차원 이상의 매트릭스)는 텐서(tensor)의 개념을 포괄하는 것일 수 있다. 임베딩 표현은 경우(또는 형태)에 따라 '임베딩 벡터(embedding vector)', '임베딩 매트릭스(embedding matrix)', '임베딩 코드(embedding code)', '잠재 표현(latent representation)' 등의 용어와 혼용되어 사용될 수도 있다.Each of the embedding expressions 14 and 15 may represent a vector or matrix representation of the graph 13. At this time, the matrix (e.g., matrix of three dimensions or more) may encompass the concept of a tensor. Embedding expression is sometimes used interchangeably with terms such as ‘embedding vector’, ‘embedding matrix’, ‘embedding code’, and ‘latent representation’. It can also be used.

도시된 바와 같이, 제1 임베딩 표현(14)은 제2 임베딩 표현(15)과 적어도 일부는 다른 그래프 정보를 포함하거나 다른 GNN들(11, 12)을 통해 생성된 것일 수 있다. 예를 들어, 제1 임베딩 표현(14)은 제1 GNN(11)을 통해 생성되고 제2 임베딩 표현(15)은 제1 GNN(11)과 다른 표현력을 갖는 제2 GNN(12)을 통해 생성된 것일 수 있다. 이러한 경우, 도 2에 도시된 바와 같이, 임베딩 시스템(10)은 임베딩 표현들(14, 15)을 통합하여 보다 강한 표현력을 갖는(또는 보다 풍부한 정보가 내재된) 통합 임베딩 표현(16)을 생성할 수 있다. 보다 구체적인 예로서, 제1 임베딩 표현(14)이 이웃 노드 정보 애그리게이팅 방식의 제1 GNN(11)을 통해 생성되고, 제2 임베딩 표현(15)이 토폴로지 정보 추출 방식의 제2 GNN(12)을 통해 생성된 것이라고 가정하자. 이 경우, 임베딩 시스템(10)은 두 임베딩 표현들(14, 15)을 통합하여 이웃 노드 정보와 토폴로지 정보가 함께 반영된 통합 임베딩 표현(16)을 생성할 수 있다. 이와 관련하여서는 도 3 내지 도 6, 도 12 및 도 13에 대한 설명을 더 참조하도록 한다.As shown, the first embedding representation 14 may include graph information that is at least partially different from the second embedding representation 15 or may be generated through other GNNs 11 and 12. For example, the first embedding representation 14 is generated through a first GNN 11 and the second embedding representation 15 is generated through a second GNN 12 having a different representation power than the first GNN 11. It may have happened. In this case, as shown in Figure 2, the embedding system 10 integrates the embedding representations 14 and 15 to produce a unified embedding representation 16 that has stronger expressive power (or contains richer information). can do. As a more specific example, the first embedding representation 14 is generated through the first GNN 11 of the neighbor node information aggregating method, and the second embedding representation 15 is generated through the second GNN 12 of the topology information extraction method. Let's assume that it was created through . In this case, the embedding system 10 may integrate the two embedding expressions 14 and 15 to generate an integrated embedding expression 16 in which neighboring node information and topology information are reflected together. In relation to this, please refer further to the description of FIGS. 3 to 6, 12, and 13.

참고로, 임베딩 표현들(14, 15)은 GNN의 최종 출력에 해당하는 것일 수도 있고, GNN의 내부 프로세싱 과정 중에 도출되는 것일 수도 있다. 또한, 제1 GNN(11)과 제2 GNN(12)은 별개의 모델을 지칭하는 것일 수도 있고, 전체 모델(e.g., 도 12 또는 도 13에 예시된 통합 모델)의 서로 다른 부분을 지칭하는 것일 수도 있다. 또한, GNN들(11, 12)은 학습된 모델일 수도 있고 그렇지 않을 수도 있다.For reference, the embedding expressions 14 and 15 may correspond to the final output of the GNN or may be derived during the internal processing of the GNN. In addition, the first GNN 11 and the second GNN 12 may refer to separate models or different parts of the overall model (e.g., the integrated model illustrated in FIG. 12 or FIG. 13). It may be possible. Additionally, GNNs 11 and 12 may or may not be learned models.

또한, GNN(또는 임베딩 표현)의 표현력이 다르다는 것은 예를 들어 동일한 그래프에 대해 서로 다른 임베딩 표현(e.g., 내재된 정보가 서로 다른 임베딩 표현)을 생성(추출)하거나, GNN이 구분하는 그래프 집합이 서로 다르다는 것을 의미할 수 있다. 보다 구체적인 예를 들어, 제1 GNN과 제2 GNN의 임베딩 표현을 이용하여 그래프 분류 태스크를 수행한다고 가정하자. 그리고, 제1 GNN(e.g., 이웃 노드 정보 애그리게이팅 방식의 GNN)의 임베딩 표현을 이용하면, 제1 그래프와 제2 그래프가 구분되나(즉, 서로 다른 클래스로 분류됨) 제3 그래프와 제4 그래프는 구분되지 않는다고 가정하자. 이와 달리, 제2 GNN(e.g., 토폴로지 정보 추출 방식의 GNN)의 임베딩 표현을 이용하면, 제1 그래프와 제2 그래프는 구분되지 않으나 제3 그래프와 제4 그래프는 구분된다고 가정하자. 이러한 경우, 제1 GNN의 표현력이 제2 GNN과 다르다고 설명될 수 있다.In addition, differences in the expressive power of GNNs (or embedding representations) mean that, for example, different embedding representations (e.g., embedding representations with different embedded information) are generated (extracted) for the same graph, or a set of graphs distinguished by a GNN is generated (extracted). It can mean that they are different. For a more specific example, assume that a graph classification task is performed using the embedding representations of the first GNN and the second GNN. And, using the embedding representation of the first GNN (e.g., GNN of the neighboring node information aggregating method), the first graph and the second graph are distinguished (i.e., classified into different classes), but the third graph and the fourth graph are distinguished from each other (i.e., classified into different classes). Assume that the graph is indistinguishable. In contrast, when using the embedding representation of the second GNN (e.g., GNN using the topology information extraction method), assume that the first graph and the second graph are not distinguished, but the third graph and the fourth graph are distinguished. In this case, it can be explained that the expressive power of the first GNN is different from that of the second GNN.

또한, 제1 GNN(또는 제1 임베딩 표현)의 표현력이 제2 GNN(또는 제2 임베딩 표현)보다 강하다는 것은 예를 들어 제1 GNN의 임베딩 표현(또는 제1 임베딩 표현)이 제2 GNN의 임베딩 표현(또는 제2 임베딩 표현)보다 풍부한 정보를 포함하고 있거나, 제1 GNN이 구분하는 그래프 집합이 제2 GNN보다 크다는 것을 의미할 수 있다. 보다 구체적인 예를 들어, 제1 GNN과 제2 GNN의 임베딩 표현을 이용하여 그래프 분류 태스크를 수행한다고 가정하자. 그리고, 제1 GNN의 임베딩 표현을 이용하면 10개의 그래프들이 모두 구분되나, 제2 GNN의 임베딩 표현을 이용하면 동일한 10개의 그래프들 중 일부가 구분되지 않는다고 가정하자. 이러한 경우, 제1 GNN의 표현력이 제2 GNN보다 강하다고 설명될 수 있고, 제1 GNN이 제2 GNN보다 입력된 그래프의 변화를 더 잘 반영하여 임베딩 표현을 생성하는 우수한 모델인 것으로 이해될 수 있다.In addition, the expressive power of the first GNN (or the first embedding representation) is stronger than that of the second GNN (or the second embedding representation). For example, the embedding representation of the first GNN (or the first embedding representation) is stronger than that of the second GNN. It may mean that it contains richer information than the embedding representation (or the second embedding representation), or that the set of graphs distinguished by the first GNN is larger than that of the second GNN. For a more specific example, assume that a graph classification task is performed using the embedding representations of the first GNN and the second GNN. Also, let us assume that all 10 graphs are distinguished when using the embedding expression of the first GNN, but some of the same 10 graphs are not distinguished when using the embedding expression of the second GNN. In this case, the expressive power of the first GNN can be explained as being stronger than that of the second GNN, and the first GNN can be understood as an excellent model that generates an embedding representation by better reflecting changes in the input graph than the second GNN. there is.

임베딩 시스템(10)은 미리 정해진 태스크(e.g., 분류 태스크, 회귀 태스크 등)를 수행하는 방식으로 통합 임베딩 표현(16)의 생성에 필요한 모듈들(e.g., GNN들, 리사이징 모듈 등)과 파라미터들을 학습시킬 수 있다. 통합 임베딩 표현(16)의 목적 태스크가 미리 정해져 있는 경우라면, 임베딩 시스템(10)은 목적 태스크를 이용하여 학습을 수행할 수 있다.The embedding system 10 learns the modules (e.g., GNNs, resizing module, etc.) and parameters necessary for generating the integrated embedding representation 16 by performing a predetermined task (e.g., classification task, regression task, etc.). You can do it. If the target task of the integrated embedding expression 16 is predetermined, the embedding system 10 can perform learning using the target task.

또한, 임베딩 시스템(10)은 학습된 모듈들과 파라미터들을 이용하여 목적 태스크를 직접 수행할 수도 있고, 목적 태스크를 수행하는 장치에게 입력된 그래프에 대한 통합 임베딩 표현(e.g., 16)을 제공할 수도 있다.In addition, the embedding system 10 may directly perform the target task using learned modules and parameters, or may provide an integrated embedding representation (e.g., 16) for the input graph to the device performing the target task. there is.

임베딩 시스템(10)이 통합 임베딩 표현을 생성하는 구체적인 방법에 관하여서는 추후 도 7 이하의 도면을 참조하여 상세하게 설명하도록 한다.The specific method by which the embedding system 10 generates the integrated embedding representation will be described in detail later with reference to the drawings of FIG. 7 and below.

임베딩 시스템(10)은 적어도 하나의 컴퓨팅 장치로 구현될 수 있다. 예를 들어, 임베딩 시스템(10)의 모든 기능이 하나의 컴퓨팅 장치에서 구현될 수도 있고, 임베딩 시스템(10)의 제1 기능이 제1 컴퓨팅 장치에서 구현되고 제2 기능이 제2 컴퓨팅 장치에서 구현될 수도 있다. 또는, 임베딩 시스템(10)의 특정 기능이 복수의 컴퓨팅 장치들에서 구현될 수도 있다.Embedding system 10 may be implemented with at least one computing device. For example, all functionality of embedding system 10 may be implemented in a single computing device, wherein a first functionality of embedding system 10 is implemented in a first computing device and a second functionality is implemented in a second computing device. It could be. Alternatively, specific functionality of embedding system 10 may be implemented in multiple computing devices.

컴퓨팅 장치는 컴퓨팅 기능을 구비한 임의의 장치를 모두 포함할 수 있으며, 이러한 장치의 일 예시에 관하여서는 도 14를 참조하도록 한다. 컴퓨팅 장치는 다양한 구성요소들(e.g. 메모리, 프로세서 등)이 상호작용하는 집합체이므로, 경우에 따라 '컴퓨팅 시스템'으로 명명될 수도 있다. 또한, 컴퓨팅 시스템은 복수의 컴퓨팅 장치들이 상호작용하는 집합체를 의미하는 것일 수도 있다.The computing device may include any device equipped with a computing function. Refer to FIG. 14 for an example of such a device. Since a computing device is a collection of interacting various components (e.g. memory, processor, etc.), it may be called a 'computing system' in some cases. Additionally, a computing system may refer to a collection of interacting computing devices.

지금까지 도 1 및 도 2를 참조하여 본 개시의 몇몇 실시예들에 따른 임베딩 시스템(10)의 동작에 대해 개략적으로 설명하였다. 이하에서는, 이해의 편의를 제공하기 위해 도 3 내지 도 6을 참조하여 이웃 노드 정보 애그리게이팅 방식과 토폴로지 정보 추출 방식의 GNN에 대하여 간략하게 설명하도록 한다.So far, the operation of the embedding system 10 according to some embodiments of the present disclosure has been schematically described with reference to FIGS. 1 and 2 . Hereinafter, for convenience of understanding, the GNN of the neighbor node information aggregation method and the topology information extraction method will be briefly described with reference to FIGS. 3 to 6.

먼저, 도 3 및 도 4를 참조하여 이웃 노드 정보 애그리게이팅 방식의 GNN의 구조 및 동작에 대하여 간략하게 설명하도록 한다.First, with reference to FIGS. 3 and 4, the structure and operation of the GNN using the neighboring node information aggregation method will be briefly described.

도 3은 본 개시의 몇몇 실시예들에서 참조될 수 있는 이웃 노드 정보 애그리게이팅 방식의 GNN(30)을 예시하고 있다.FIG. 3 illustrates a GNN 30 of a neighboring node information aggregating scheme that may be referenced in some embodiments of the present disclosure.

도 3에 도시된 바와 같이, 예시된 GNN(30)은 그래프(33)(정확하게는, 그래프 데이터)를 입력받고 이웃 노드들의 정보(e.g., 피처들)를 반복적으로 애그리게이팅함으로서 입력된 그래프(33)에 대한 임베딩 표현(34)을 생성하도록 동작할 수 있다. 이를 위해, GNN(30)은 다수의 블록들(31-1 내지 31-n)과 풀링 모듈(32)을 포함하도록 구성될 수 있고, 경우에 따라 다른 모듈들을 더 포함할 수도 있다.As shown in FIG. 3, the illustrated GNN 30 receives a graph 33 (more precisely, graph data) and repeatedly aggregates information (e.g., features) of neighboring nodes to generate the input graph 33. ) may operate to generate an embedding representation 34 for ). To this end, the GNN 30 may be configured to include a plurality of blocks 31-1 to 31-n and a pooling module 32, and may further include other modules depending on the case.

다수의 블록들(31-1 내지 31-n)은 그래프(33)를 구성하는 이웃 노드들의 정보를 반복적으로 애그리게이팅할 수 있다. 각각의 블록(e.g., 31-1)은 예를 들어 다수의 멀티-레이어 퍼셉트론(즉, 완전 연결 레이어)들로 구성될 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다.A plurality of blocks 31-1 to 31-n may repeatedly aggregate information of neighboring nodes constituting the graph 33. Each block (e.g., 31-1) may be composed of, for example, multiple multi-layer perceptrons (i.e., fully connected layers), but the scope of the present disclosure is not limited thereto.

다음으로, 풀링 모듈(32)은 풀링(pooling) 연산을 수행하여 적절한 사이즈의 임베딩 표현(34)을 생성(출력)할 수 있다. 당해 기술 분야의 종사자라면, GNN에서 이용되는 풀링 연산에 대해 이미 숙지하고 있을 것인 바 이에 대한 설명은 생략하도록 한다. 풀링 모듈(32)은 경우에 따라 '풀링 레이어(pooling layer)', '리드아웃 레이어(readout layer)', '리드아웃 모듈(readout module)' 등과 같은 용어와 혼용되어 사용될 수도 있다.Next, the pooling module 32 may perform a pooling operation to generate (output) an embedding representation 34 of an appropriate size. Anyone working in the relevant technical field will already be familiar with the pooling operation used in GNN, so description thereof will be omitted. In some cases, the pooling module 32 may be used interchangeably with terms such as 'pooling layer', 'readout layer', and 'readout module'.

예시된 GNN(30)을 통해 생성된 임베딩 표현(34)은 보통 그래프(33)의 노드 정보를 잘 반영하고 있지만, 그래프(33)의 토폴로지 정보(e.g., 그래프의 전반적인 형태/모양 등)까지 잘 반영하고 있지는 않다. 따라서, 해당 임베딩 표현(34)을 이용하면 토폴로지 정보가 중요하게 작용하는 태스크에 대한 성능이 떨어질 수 밖에 없다.The embedding representation (34) generated through the example GNN (30) usually well reflects the node information of the graph (33), but also reflects the topology information (e.g., overall shape/shape of the graph, etc.) of the graph (33) well. It is not reflected. Therefore, if the embedding expression 34 is used, performance on tasks in which topology information is important will inevitably deteriorate.

상술한 GNN(30)의 보다 자세한 예시는 도 4에 도시되어 있다.A more detailed example of the GNN 30 described above is shown in FIG. 4.

도 4에 예시된 GNN(40)은 노드 튜플(tuple)에 대한 피처 매트릭스(e.g., 노드의 개수가 'v'이고 피처의 차원수가 'd'인 경우, 사이즈가 v*v*d인 3차원 매트릭스)를 입력받아 임베딩 벡터(45, e.g., 'p' 차원의 임베딩 벡터)를 생성(출력)하도록 동작할 수 있다.The GNN 40 illustrated in Figure 4 is a feature matrix for a node tuple (e.g., when the number of nodes is 'v' and the number of dimensions of features is 'd', a three-dimensional matrix of size v*v*d Matrix) can be input and operated to generate (output) an embedding vector (45, e.g., 'p' dimension embedding vector).

구체적으로, GNN(40)은 다수의 블록들(41-1 내지 41-n)을 통해 입력된 피처 매트릭스를 반복적으로 애그리게이팅함으로써 노드 튜플에 대한 임베딩 매트릭스(e.g., 사이즈가 v*v*p인 3차원 매트릭스)를 생성할 수 있고, 최종적으로 풀링 모듈(42)을 통해 입력된 그래프에 대한 임베딩 벡터(45)를 생성(출력)할 수 있다.Specifically, the GNN 40 creates an embedding matrix (e.g., with a size of v*v*p) for the node tuple by repeatedly aggregating the feature matrix input through a plurality of blocks 41-1 to 41-n. A 3D matrix) can be created, and finally, an embedding vector 45 for the input graph can be generated (output) through the pooling module 42.

참고로, PPGN(Provably Powerful Graph Network) 등과 같은 GNN이 도 4와 유사한 방식으로 동작하는데, 이에 관하여서는 'Provably Powerful Graph Network'으로 명명되는 논문을 참조하도록 한다.For reference, GNNs such as PPGN (Provably Powerful Graph Network) operate in a similar manner to Figure 4. For this, please refer to the paper titled 'Provably Powerful Graph Network'.

다음으로, 도 5 및 도 6을 참조하여 토폴로지 정보 추출 방식의 GNN의 구조 및 동작에 대하여 간략하게 설명하도록 한다.Next, the structure and operation of the GNN using the topology information extraction method will be briefly described with reference to FIGS. 5 and 6.

도 5는 본 개시의 몇몇 실시예들에서 참조될 수 있는 토폴로지 정보 추출 방식의 GNN(50)을 예시하고 있다.FIG. 5 illustrates a GNN 50 of a topology information extraction method that may be referenced in some embodiments of the present disclosure.

도 5에 도시된 바와 같이, 예시된 GNN(50)은 입력된 그래프(54)에서 추출된 토폴로지 정보를 반영하여 해당 그래프(54)의 임베딩 표현(55)을 생성하도록 동작할 수 있다. 도 5 이하의 도면에서, 음영 표시가 추가된 임베딩 표현(e.g., 55)은 토폴로지 정보가 반영된 임베딩 표현을 의미한다.As shown in FIG. 5 , the illustrated GNN 50 may operate to generate an embedding representation 55 of the input graph 54 by reflecting topology information extracted from the input graph 54 . In the drawings of FIG. 5 and below, the embedding expression with added shading (e.g., 55) means an embedding expression in which topology information is reflected.

GNN(50)은 다수의 블록들(51-1 내지 51-n), 토폴로지 정보 추출 모듈(52) 및 풀링 모듈(53)을 포함하도록 구성될 수 있고, 경우에 따라 다른 모듈들을 더 포함할 수도 있다.The GNN 50 may be configured to include a plurality of blocks 51-1 to 51-n, a topology information extraction module 52, and a pooling module 53, and may further include other modules in some cases. there is.

도 3에 예시된 GNN(30)과 유사하게, 다수의 블록들(51-1 내지 51-n)은 그래프(55)를 구성하는 이웃 노드들의 정보를 반복적으로 애그리게이팅할 수 있다. 각각의 블록(e.g., 51-1)은 예를 들어 다수의 멀티-레이어 퍼셉트론(즉, 완전 연결 레이어)들로 구성될 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다.Similar to the GNN 30 illustrated in FIG. 3, multiple blocks 51-1 to 51-n may repeatedly aggregate information of neighboring nodes constituting the graph 55. Each block (e.g., 51-1) may be composed of, for example, multiple multi-layer perceptrons (i.e., fully connected layers), but the scope of the present disclosure is not limited thereto.

다음으로, 토폴로지 정보 추출 모듈(52)은 입력된 그래프(54)의 토폴로지 정보를 추출할 수 있다. 예를 들어, 토폴로지 정보 추출 모듈(52)은 지속 다이어그램(persistence diagram)을 계산하는 방식으로 토폴로지 정보를 추출할 수 있다. 당해 기술 분야의 종사자라면, 지속 다이어그램의 개념 및 계산 방식에 대해 이미 숙지하고 있을 것인 바, 이에 대한 설명은 생략하도록 한다.Next, the topology information extraction module 52 can extract topology information of the input graph 54. For example, the topology information extraction module 52 may extract topology information by calculating a persistence diagram. Anyone working in the relevant technical field will already be familiar with the concept and calculation method of a persistence diagram, so description thereof will be omitted.

다음으로, 풀링 모듈(53)은 풀링 연산을 수행하여 적절한 사이즈의 임베딩 표현(55)을 생성(출력)할 수 있다.Next, the pooling module 53 may perform a pooling operation to generate (output) an embedding representation 55 of an appropriate size.

예시된 GNN(50)을 통해 생성된 임베딩 표현(55)은 보통 그래프(54)의 토폴로지 정보를 잘 반영하고 있지만, 그래프(54)의 노드 정보까지 잘 반영하고 있지는 않다. 따라서, 해당 임베딩 표현(55)을 이용하면 그래프의 노드 정보가 중요하게 작용하는 태스크에 대한 성능이 떨어질 수 밖에 없다.The embedded representation 55 generated through the illustrated GNN 50 usually well reflects the topology information of the graph 54, but does not well reflect the node information of the graph 54. Therefore, if the embedding representation 55 is used, performance on tasks in which graph node information is important will inevitably deteriorate.

상술한 GNN(50)의 보다 자세한 예시는 도 6에 도시되어 있다.A more detailed example of the GNN 50 described above is shown in FIG. 6.

도 6에 예시된 GNN(60)은 노드에 대한 피처 매트릭스(e.g., 노드의 개수가 'v'이고 피처의 차원수가 'd'인 경우, 사이즈가 v*d인 2차원 매트릭스)를 입력받아 임베딩 벡터(65, e.g., 't' 차원의 임베딩 벡터)를 출력하도록 동작할 수 있다.The GNN 60 illustrated in FIG. 6 receives and embeds a feature matrix for a node (e.g., when the number of nodes is 'v' and the dimensionality of features is 'd', a two-dimensional matrix of size v*d). It can be operated to output a vector (65, e.g., 't' dimension embedding vector).

구체적으로, GNN(60)은 다수의 블록들(61-1 내지 61-n)을 통해 입력된 피처 매트릭스를 반복적으로 애그리게이팅하여 노드에 대한 임베딩 매트릭스를 생성할 수 있다. 그리고, GNN(60)은 지속 다이어그램 계산 모듈(62)을 통해 노드 임베딩 매트릭스에서 토폴로지 정보를 추출하고 추출된 정보를 반영하여 임베딩 매트릭스(65, e.g., 사이즈가 v*t인 2차원 매트릭스)를 생성할 수 있으며, 최종적으로 풀링 모듈(63)을 통해 입력된 그래프에 대한 임베딩 벡터(65)를 생성(출력)할 수 있다.Specifically, the GNN 60 may generate an embedding matrix for a node by repeatedly aggregating a feature matrix input through a plurality of blocks 61-1 to 61-n. Then, the GNN (60) extracts topology information from the node embedding matrix through the persistence diagram calculation module (62) and reflects the extracted information to generate an embedding matrix (65, e.g., a two-dimensional matrix with size v*t). This can be done, and finally, an embedding vector 65 for the input graph can be generated (output) through the pooling module 63.

참고로, GFL(Graph Filtration Learning) 등과 같은 GNN이 도 6과 유사한 방식으로 동작하는데, 이에 관하여서는 'Graph Filtration Learning'으로 명명되는 논문을 참조하도록 한다.For reference, GNNs such as GFL (Graph Filtration Learning) operate in a similar manner to Figure 6. For this, please refer to the paper titled 'Graph Filtration Learning'.

지금까지 도 3 내지 도 6을 참조하여 본 개시의 몇몇 실시예들에서 참조될 수 있는 GNN들(30 내지 60)에 대하여 설명하였다. 이하에서는 도 7 이하의 도면을 참조하여 상술한 임베딩 시스템(10)에서 수행될 수 있는 다양한 방법들에 대하여 설명하도록 한다. 다만, 이해의 편의를 제공하기 위해, 후술될 방법들의 모든 단계/동작이 임베딩 시스템(10)에서 수행되는 것을 가정하여 설명을 이어가도록 한다. 따라서, 특정 단계/동작의 주체가 생략된 경우, 임베딩 시스템(10)에서 수행되는 것으로 이해될 수 있다. 다만, 실제 환경에서는, 후술된 방법들의 일부 단계/동작이 다른 컴퓨팅 장치에서 수행될 수도 있다.So far, GNNs 30 to 60 that can be referenced in some embodiments of the present disclosure have been described with reference to FIGS. 3 to 6. Hereinafter, various methods that can be performed in the above-described embedding system 10 will be described with reference to the drawings of FIG. 7 and below. However, to provide convenience of understanding, the description will be continued assuming that all steps/operations of the methods to be described later are performed in the embedding system 10. Accordingly, when the subject of a specific step/action is omitted, it can be understood as being performed in the embedding system 10. However, in a real environment, some steps/operations of the methods described below may be performed on other computing devices.

먼저, 도 7 내지 도 9를 참조하여 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 상세하게 설명하도록 한다.First, a graph embedding method according to some embodiments of the present disclosure will be described in detail with reference to FIGS. 7 to 9.

본 실시예는 타깃 그래프에 대한 2개의 임베딩 표현들을 통합하는 방법에 관한 것이다. 3개 이상의 임베딩 표현들을 통합하는 방법에 관하여서는 도 11을 참조하여 후술하도록 한다.This embodiment relates to a method for integrating two embedding representations for a target graph. A method of integrating three or more embedding expressions will be described later with reference to FIG. 11.

도 7은 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법을 나타내는 예시적인 흐름도이다. 단, 이는 본 개시의 목적을 달성하기 위한 바람직한 실시예일뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.7 is an example flowchart illustrating a graph embedding method according to some embodiments of the present disclosure. However, this is only a preferred embodiment for achieving the purpose of the present disclosure, and of course, some steps may be added or deleted as needed.

도 7에 도시된 바와 같이, 본 실시예는 타깃 그래프에 대한 제1 임베딩 표현과 제2 임베딩 표현을 획득하는 단계 S71에서 시작될 수 있다. 가령, 도 8에 도시된 바와 같이, 임베딩 시스템(10)은 제1 GNN(81)을 통해 제1 임베딩 표현(83)을 획득하고 제2 GNN(82, e.g., 제1 GNN과 다른 표현력을 갖는 GNN)을 통해 제2 임베딩 표현(84)을 획득할 수 있다. 이때, 제1 GNN(81)은 이웃 노드 정보 애그리게이팅 방식의 GNN(도 3 및 도 4 참조)이고 제2 GNN(82)은 토폴로지 정보 추출 방식의 GNN(도 5 및 도 6 참조)일 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 또는, 제1 임베딩 표현(83)은 타깃 그래프의 이웃 노드들의 정보(e.g., 피처 정보)를 애그리게이팅하는 임베딩 방식에 의해 생성된 것(즉, 노드 정보 중심의 임베딩 표현)이고, 제2 임베딩 표현(84)은 타깃 그래프의 토폴로지 정보를 반영하는 임베딩 방식에 의해 생성된 것(즉, 토폴로지 정보 중심의 임베딩 표현)일 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니다.As shown in Figure 7, this embodiment may begin at step S71 of obtaining a first embedding representation and a second embedding representation for the target graph. For example, as shown in FIG. 8, the embedding system 10 obtains the first embedding representation 83 through the first GNN 81 and uses a second GNN 82, e.g., which has different representation power from the first GNN. The second embedding representation 84 can be obtained through GNN). At this time, the first GNN 81 may be a neighboring node information aggregating GNN (see FIGS. 3 and 4), and the second GNN 82 may be a topology information extraction GNN (see FIGS. 5 and 6). , the scope of the present disclosure is not limited thereto. Alternatively, the first embedding expression 83 is generated by an embedding method that aggregates information (e.g., feature information) of neighboring nodes of the target graph (i.e., node information-centered embedding expression), and the second embedding expression is (84) may be generated by an embedding method that reflects the topology information of the target graph (i.e., a topology information-centered embedding expression). However, the scope of the present disclosure is not limited thereto.

상술한 바와 같이, 제1 임베딩 표현과 제2 임베딩 표현은 GNN의 최종 출력에 해당할 수도 있고 GNN의 내부 프로세싱 과정 중에 도출되는 것일 수도 있다. 가령, 제1 임베딩 표현과 제2 임베딩 표현은 풀링 연산의 결과로 얻어진 임베딩 벡터일 수도 있고, 풀링 연산 이전에 도출된 임베딩 매트릭스(e.g., 2차원/3차원 매트릭스 등)일 수도 있다. 참고로, 제1 임베딩 표현과 제2 임베딩 표현이 풀링 연산의 결과로 얻어진 임베딩 벡터라면, 도 8에 예시된 바와는 달리 풀링 연산은 수행되지 않을 수도 있다.As described above, the first embedding expression and the second embedding expression may correspond to the final output of the GNN or may be derived during the internal processing of the GNN. For example, the first embedding expression and the second embedding expression may be an embedding vector obtained as a result of a pooling operation, or may be an embedding matrix (e.g., a 2-dimensional/3-dimensional matrix, etc.) derived before the pooling operation. For reference, if the first embedding expression and the second embedding expression are embedding vectors obtained as a result of a pooling operation, unlike the example in FIG. 8, the pooling operation may not be performed.

단계 S72에서, 제2 임베딩 표현에 특정값(e.g., 스칼라 값)이 반영될 수 있고, 그 결과로 제2 임베딩 표현의 값은 변경될 수 있다. 가령, 도 8에 도시된 바와 같이, 임베딩 시스템(10)은 곱셈 연산을 통해 제2 임베딩 표현(84)에 특정값(e.g., '+ε')을 반영(e.g., 임베딩 매트릭스의 각 요소에 특정값을 곱함)할 수 있다. 여기서, 제2 임베딩 표현(84)에 특정값을 반영하는 이유는 애그리게이팅(e.g., 덧셈 연산) 과정 중에 제1 임베딩 표현(83)과 제2 임베딩 표현(84)의 값이 상쇄되는 것을 방지하기 위한 것으로 이해될 수 있다. 값이 상쇄된다는 것은 곧 두 임베딩 표현들(84, 85)에 내재된 정보(또는 표현력)가 손실되는 것을 의미하기 때문이다.In step S72, a specific value (e.g., a scalar value) may be reflected in the second embedding expression, and as a result, the value of the second embedding expression may be changed. For example, as shown in Figure 8, the embedding system 10 reflects a specific value (e.g., '+ε') in the second embedding expression 84 through a multiplication operation (e.g., specific to each element of the embedding matrix (multiply the value). Here, the reason for reflecting a specific value in the second embedding expression 84 is to prevent the values of the first embedding expression 83 and the second embedding expression 84 from being offset during the aggregating (e.g., addition operation) process. It can be understood that it is for. This is because the fact that the values are offset means that the information (or expressive power) inherent in the two embedding expressions 84 and 85 is lost.

도 8은 제2 임베딩 표현(84)에만 특정값을 반영하는 경우를 예로서 도시하고 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 가령, 임베딩 시스템(10)은 제2 임베딩 표현(84) 대신 제1 임베딩 표현(83)에 특정값을 반영할 수도 있다. 또는, 임베딩 시스템(10)은 제1 임베딩 표현(83)에 제1 특정값을 반영하고 제2 임베딩 표현(84)에 제2 특정값(즉, 제1 특정값과 다른 값)을 반영할 수도 있다.Figure 8 shows as an example a case where a specific value is reflected only in the second embedding expression 84, but the scope of the present disclosure is not limited thereto. For example, the embedding system 10 may reflect a specific value in the first embedding expression 83 instead of the second embedding expression 84. Alternatively, the embedding system 10 may reflect the first specific value in the first embedding representation 83 and a second specific value (i.e., a value different from the first specific value) in the second embedding representation 84. there is.

한편, 특정값을 도출/생성하는 구체적인 방식은 실시예에 따라 달라질 수 있다.Meanwhile, the specific method of deriving/generating a specific value may vary depending on the embodiment.

몇몇 실시예들에서, 특정값은 사전에 설정되는 값일 수 있다. 가령, 특정값은 일종의 하이퍼파라미터(e.g., 'ε')에 기반한 값으로서, 사용자에 의해 사전에 설정될 수 있다. 보다 구체적인 예로서, 특정값은 무리수로 설정될 수 있다. 두 임베딩 표현들 중 어느 한쪽(또는 양쪽)에 무리수가 곱해지면, 애그리게이팅(e.g., 덧셈 연산) 과정 중에 두 임베딩 표현의 값이 상쇄되는 것이 효과적으로 방지될 수 있기 때문이다.In some embodiments, the specific value may be a preset value. For example, a specific value is a value based on a type of hyperparameter (e.g., 'ε') and can be set in advance by the user. As a more specific example, the specific value may be set to an irrational number. This is because if one (or both) of the two embedding expressions is multiplied by an irrational number, the values of the two embedding expressions can be effectively prevented from being offset during the aggregating (e.g., addition operation) process.

다른 몇몇 실시예들에서, 특정값은 학습가능 파라미터(learnable parameter)에 기반하여 도출되는 값일 수 있다(e.g., 학습가능 파라미터가 'ε'인 경우, 특정값은 ε 자체의 값, ε+무리수 등이 될 수 있음). 가령, 임베딩 시스템(10)은 통합 임베딩 표현을 이용하여 미리 정해진 태스크(즉, 학습용 태스크)에 관한 레이블을 예측하고 예측 레이블과 정답 레이블(즉, 타깃 그래프의 정답 레이블)의 차이에 기초하여 학습가능 파라미터의 값을 업데이트할 수 있다. 이러한 경우, 통합 임베딩 표현을 생성하기 위한 학습이 진행됨에 따라 임베딩 표현들의 정보(또는 표현력) 손실을 방지할 수 있는 최적의 값이 자연스럽고도 정확하게 도출될 수 있다.In some other embodiments, the specific value may be a value derived based on a learnable parameter (e.g., when the learnable parameter is 'ε', the specific value is the value of ε itself, ε + an irrational number, etc. could be). For example, the embedding system 10 can predict the label for a predetermined task (i.e., a learning task) using an integrated embedding expression and learn based on the difference between the predicted label and the correct label (i.e., the correct label of the target graph). The value of the parameter can be updated. In this case, as learning to generate an integrated embedding expression progresses, the optimal value that can prevent loss of information (or expressive power) of the embedding expressions can be naturally and accurately derived.

단계 S73에서, 제1 임베딩 표현과 변경된 제2 임베딩 표현을 애그리게이팅하여 타깃 그래프에 대한 통합 임베딩 표현이 생성될 수 있다. 가령, 도 8에 예시된 바와 같이, 임베딩 시스템(10)은 제1 임베딩 표현(83)과 변경된 제2 임베딩 표현(미도시) 각각에 대해 풀링 연산을 수행하고, 풀링 연산의 결과(85, 86)를 애그리게이팅하여 통합 임베딩 표현(87, e.g., 임베딩 벡터)을 생성할 수 있다. 다만, 상술한 바와 같이, 제1 임베딩 표현(83)과 제2 임베딩 표현(84)이 풀링 연산의 결과로 얻어진 것이라면(e.g., 임베딩 벡터라면), 도 8에 예시된 연산들 중에서 풀링 연산은 생략될 수도 있다.In step S73, a unified embedding representation for the target graph may be generated by aggregating the first embedding representation and the changed second embedding representation. For example, as illustrated in FIG. 8, the embedding system 10 performs a pooling operation on each of the first embedding representation 83 and the changed second embedding representation (not shown), and produces the results 85 and 86 of the pooling operation. ) can be aggregated to create a unified embedding representation (87, e.g., embedding vector). However, as described above, if the first embedding expression 83 and the second embedding expression 84 are obtained as a result of a pooling operation (e.g., if it is an embedding vector), the pooling operation is omitted among the operations illustrated in FIG. 8. It could be.

도 8은 덧셈 연산을 통해 풀링된 임베딩 표현들(85, 86)이 애그리게이팅되는 경우를 예로서 도시하고 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 다만, 도 8에 예시된 바와 같이, 곱셈 연산과 덧셈 연산을 이용하여 임베딩 벡터들(83, 84)을 통합(애그리게이팅)하면, 임베딩 표현들(83, 84)에 내재된 정보(또는 표현력)가 보존된다는 점이 본 개시의 발명자에 의해 수학적으로 증명되었다. 아울러, 덧셈 연산을 이용하면, 통합되는 임베딩 표현들(e.g., 83, 84)의 개수가 증가하더라도 통합 임베딩 표현(87)의 사이즈(e.g., 통합 임베딩 벡터의 차원수)가 증가하지 않기 때문에, 연관 태스크(e.g., 분류, 회귀 등의 다운스트림 태스크)의 복잡도 증가 문제가 용이하게 해결될 수 있다는 장점도 있다.Figure 8 shows as an example a case where the pooled embedding expressions 85 and 86 are aggregated through an addition operation, but the scope of the present disclosure is not limited thereto. However, as illustrated in FIG. 8, if the embedding vectors 83 and 84 are integrated (aggregated) using a multiplication operation and an addition operation, the information (or expressive power) inherent in the embedding expressions 83 and 84 It has been mathematically proven by the inventor of the present disclosure that is preserved. In addition, when the addition operation is used, even if the number of integrated embedding expressions (e.g., 83, 84) increases, the size of the integrated embedding expression (87) (e.g., the number of dimensions of the integrated embedding vector) does not increase, so the correlation There is also the advantage that the problem of increasing complexity of tasks (e.g., downstream tasks such as classification, regression, etc.) can be easily solved.

한편, 도 7에 도시되어 있지는 않으나, 임베딩 시스템(10)은 미리 정해진 태스크(즉, 학습용 태스크)를 수행하는 방식으로 통합 임베딩 표현을 생성하는데 필요한 모듈들과 파라미터들을 학습시킬 수 있다. 구체적으로, 도 9에 도시된 바와 같이, 임베딩 시스템(10)은 통합 임베딩 표현(87)을 태스크에 특화된 예측 모듈(91, 또는 '예측 레이어')에 입력하여 타깃 그래프에 대한 레이블(92, e.g., 클래스 레이블)을 예측할 수 있다. 그리고, 임베딩 시스템(10)은 예측 레이블(92)과 타깃 그래프에 대한 정답 레이블(93)과의 차이(e.g., 손실 94)에 기초하여 관련 모듈들/파라미터들(e.g., 예측 모듈 91, 특정값 도출을 위한 파라미터, 도 10에 예시된 리사이징 모듈, 제1 GNN 81, 제2 GNN 82 등)을 학습시킬 수 있다. 이러한 학습 과정은 물론 정답 레이블(e.g., 93)이 주어진 다수의 그래프들(즉, 트레이닝셋)에 대하여 반복적으로 수행될 수 있다.Meanwhile, although not shown in FIG. 7, the embedding system 10 can learn the modules and parameters necessary to generate an integrated embedding representation by performing a predetermined task (i.e., a learning task). Specifically, as shown in Figure 9, the embedding system 10 inputs the integrated embedding representation 87 into a task-specific prediction module 91, or 'prediction layer', to obtain a label 92, e.g., for the target graph. , class label) can be predicted. And, the embedding system 10 determines related modules/parameters (e.g., prediction module 91, specific value) based on the difference (e.g., loss 94) between the prediction label 92 and the correct answer label 93 for the target graph. Parameters for derivation, resizing module illustrated in FIG. 10, first GNN 81, second GNN 82, etc.) can be learned. This learning process can, of course, be performed repeatedly on multiple graphs (i.e., training set) given the correct answer label (e.g., 93).

참고로, 미리 정해진 태스크가 분류 태스크인 경우, 예측 모듈(91)은 예를 들어 클래스 레이블을 예측하도록 구성된 신경망 레이어(e.g., 완전 연결 레이어)로 구현될 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니며, 예측 모듈(91)의 세부 구조는 태스크에 따라 얼마든지 변형될 수 있다.For reference, if the predetermined task is a classification task, the prediction module 91 may be implemented, for example, as a neural network layer (e.g., fully connected layer) configured to predict class labels. However, the scope of the present disclosure is not limited thereto, and the detailed structure of the prediction module 91 may be modified depending on the task.

지금까지 도 7 내지 도 9를 참조하여 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하였다. 상술한 바에 따르면, 타깃 그래프에 대한 다양한 임베딩 표현들 중 적어도 일부에 특정값을 반영하고 애그리게이팅함으로써 타깃 그래프에 대한 통합 임베딩 표현이 생성될 수 있다. 이러한 경우, 애그리게이팅 과정 중에 서로 다른 임베딩 표현의 값들이 상쇄되는 것이 특정값에 의해 방지될 수 있기 때문에, 정보(또는 표현력)의 손실 없이 다양한 임베딩 표현들이 통합될 수 있다. 가령, 특정 임베딩 표현에 무리수를 곱셈함으로써 애그리게이팅(e.g., 덧셈) 과정 중에 서로 다른 임베딩 표현의 값들이 상쇄되는 것이 효과적으로 방지될 수 있다. 또한, 타깃 그래프에 대한 다양한 임베딩 표현(e.g., 임베딩 벡터)들이 덧셈 연산을 통해 애그리게이팅될 수 있다. 이러한 경우, 임베딩 표현들의 개수가 증가하더라도 통합 임베딩 표현의 사이즈(e.g., 통합 임베딩 벡터의 차원수)는 증가하지 않기 때문에, 연관 태스크(e.g., 분류, 회귀 등의 다운스트림 태스크)의 복잡도 증가 문제가 용이하게 해결될 수 있다.So far, a graph embedding method according to some embodiments of the present disclosure has been described with reference to FIGS. 7 to 9. According to the above, an integrated embedding expression for the target graph can be created by reflecting and aggregating specific values in at least some of the various embedding expressions for the target graph. In this case, since the values of different embedding expressions can be prevented from being offset by a specific value during the aggregating process, various embedding expressions can be integrated without loss of information (or expressiveness). For example, by multiplying a specific embedding expression by an irrational number, the values of different embedding expressions can be effectively prevented from being canceled out during the aggregating (e.g., addition) process. Additionally, various embedding expressions (e.g., embedding vectors) for the target graph can be aggregated through addition operations. In this case, even if the number of embedding expressions increases, the size of the integrated embedding expression (e.g., the number of dimensions of the integrated embedding vector) does not increase, so there is a problem of increasing complexity of related tasks (e.g., downstream tasks such as classification and regression). It can be easily solved.

이하에서는, 도 10을 참조하여 본 개시의 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하도록 한다. 다만, 본 개시의 명료함을 위해, 앞선 실시예들과 중복되는 내용에 대한 설명은 생략하도록 한다.Hereinafter, a graph embedding method according to some other embodiments of the present disclosure will be described with reference to FIG. 10. However, for clarity of the present disclosure, description of content that overlaps with the previous embodiments will be omitted.

도 10에 도시된 바와 같이, 본 실시예는 제1 임베딩 표현(103)과 제2 임베딩 표현(104)의 사이즈(e.g., 임베딩 매트릭스의 사이즈, 임베딩 벡터의 차원수)가 다른 경우에 리사이징(resizing) 연산(또는 모듈)을 더 이용하여 통합 임베딩 표현(107)을 생성하는 방법에 관한 것이다. 도 10은 임베딩 표현들(103, 104)이 서로 다른 GNN(101, 102)을 통해 생성된 임베딩 매트릭스인 경우를 가정하고 있다.As shown in FIG. 10, this embodiment performs resizing when the sizes (e.g., size of the embedding matrix, number of dimensions of the embedding vector) of the first embedding expression 103 and the second embedding expression 104 are different. ) operation (or module) is further used to generate a unified embedding representation (107). Figure 10 assumes that the embedding representations 103 and 104 are embedding matrices generated through different GNNs 101 and 102.

구체적으로, 임베딩 시스템(10)은 제1 임베딩 표현(103)과 제2 임베딩 표현(104)에 대해 리사이징 연산을 수행할 수 있다. 가령, 두 임베딩 표현들(103, 104)이 매트릭스 형태의 표현인 경우, 임베딩 시스템(10)은 매트릭스의 사이즈를 변경시키는 연산을 통해 두 임베딩 표현들(103, 104)의 사이즈를 조정할 수 있다. 도 10은 리사이징 연산이 두 임베딩 매트릭스들(103, 104)의 마지막 차원(e.g., 'p₁', 'p₂')의 길이(또는 풀링 연산의 결과로 얻어진 임베딩 벡터들 105, 106의 차원수)를 일치시키는 용도로 수행되는 경우를 예시하고 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니다. 또한, 도 10은 리사이징 연산이 멀티-레이어 퍼셉트론에 기초하여 구현되는 경우를 예로서 도시하고 있다. 멀티-레이어 퍼셉트론은 입력된 임베딩 매트릭스와 가중치 매트릭스(즉, 자신의 가중치 파라미터들)와의 곱셈을 통해 입력된 임베딩 매트릭스의 사이즈를 자연스럽게 변경시킬 수 있다. 그러나, 본 개시의 범위가 이에 한정되는 것은 아니며, 리사이징 연산은 다른 방식으로 구현될 수도 있다.Specifically, the embedding system 10 may perform a resizing operation on the first embedding representation 103 and the second embedding representation 104. For example, when the two embedding expressions 103 and 104 are matrix-type expressions, the embedding system 10 can adjust the size of the two embedding expressions 103 and 104 through an operation that changes the size of the matrix. Figure 10 shows that the resizing operation is performed on the length of the last dimension (eg, 'p ₁ ', 'p ₂ ') of the two embedding matrices 103 and 104 (or the number of dimensions of the embedding vectors 105 and 106 obtained as a result of the pooling operation). ) is exemplified for the purpose of matching. However, the scope of the present disclosure is not limited thereto. In addition, Figure 10 shows as an example a case where the resizing operation is implemented based on a multi-layer perceptron. The multi-layer perceptron can naturally change the size of the input embedding matrix through multiplication of the input embedding matrix and the weight matrix (i.e., its weight parameters). However, the scope of the present disclosure is not limited thereto, and the resizing operation may be implemented in other ways.

다음으로, 임베딩 시스템(10)은 특정값 반영, 풀링 등의 연산을 수행하고, 풀링 연산의 결과(105, 106)를 애그리게이팅하여 통합 임베딩 표현(107)을 생성할 수 있다. 가령, 임베딩 시스템(10)은 풀링 연산의 결과로 얻어진 임베딩 벡터들(105, 106)에 대해 덧셈 연산을 수행하여 벡터 형태의 통합 임베딩 표현(107)을 생성할 수 있다.Next, the embedding system 10 may perform operations such as reflecting a specific value and pooling, and generate an integrated embedding expression 107 by aggregating the results 105 and 106 of the pooling operation. For example, the embedding system 10 may perform an addition operation on the embedding vectors 105 and 106 obtained as a result of the pooling operation to generate an integrated embedding representation 107 in the form of a vector.

한편, 도 10은 두 임베딩 표현들(103, 104) 모두에 대해 리사이징 연산이 수행되는 경우를 예로서 도시하고 있으나, 경우에 따라 어느 한쪽에만 리사이징 연산이 수행될 수도 있음은 물론이다.Meanwhile, Figure 10 shows as an example a case where a resizing operation is performed on both of the two embedding representations 103 and 104, but of course, in some cases, a resizing operation may be performed on only one of them.

또한, 도 10은 풀링 연산이 수행되기 전에 리사이징 연산이 수행되는 경우를 예로서 도시하고 있으나, 경우에 따라 풀링 연산 이후에 리사이징 연산이 수행될 수도 있다. 이를테면, 풀링 연산의 결과로 얻어진 임베딩 벡터들(e.g., 105, 106)에 대해 리사이징 연산이 수행될 수도 있다.In addition, Figure 10 shows as an example a case where the resizing operation is performed before the pooling operation is performed, but in some cases, the resizing operation may be performed after the pooling operation. For example, a resizing operation may be performed on the embedding vectors (e.g., 105, 106) obtained as a result of the pooling operation.

또한, 도 10은 특정값(e.g., '1+ε')이 반영되기 전에 리사이징 연산이 수행되는 경우를 예로서 도시하고 있으나, 경우에 따라 특정값(e.g., '1+ε')이 반영된 이후에 리사이징 연산이 수행될 수도 있다.In addition, Figure 10 shows as an example a case where the resizing operation is performed before a specific value (e.g., '1+ε') is reflected, but in some cases, after a specific value (e.g., '1+ε') is reflected. A resizing operation may be performed on .

한편, 몇몇 실시예들에서는, 제1 임베딩 표현(103)과 제2 임베딩 표현(104)의 사이즈가 같은 경우에도 멀티-레이어 퍼셉트론이 적용될 수도 있다. 이러한 경우, 멀티-레이어 퍼셉트론은 주어진 임베딩 표현(e.g., 103, 104)을 적절한 임베딩 공간(e.g., 다른 임베딩 표현의 공간, 공동 임베딩 공간 등)의 표현으로 변환시키는 역할을 수행할 수 있다. 따라서, 이러한 실시예에서는, 멀티-레이어 퍼셉트론이 '변환 모듈', '변환 레이어', '프로젝션 모듈', '프로젝션 레이어' 등의 용어로 명명될 수도 있다.Meanwhile, in some embodiments, a multi-layer perceptron may be applied even when the first and second embedding representations 103 and 104 have the same size. In this case, the multi-layer perceptron can play the role of converting a given embedding representation (e.g., 103, 104) into a representation of an appropriate embedding space (e.g., space of other embedding representations, joint embedding space, etc.). Accordingly, in this embodiment, the multi-layer perceptron may be named by terms such as 'transformation module', 'transformation layer', 'projection module', and 'projection layer'.

지금까지 도 10을 참조하여 본 개시의 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하였다. 상술한 바에 따르면, 멀티-레이어 퍼셉트론 등에 기초하여 구현된 리사이징 연산을 통해 사이즈가 서로 다른 임베딩 표현들(e.g., 임베딩 매트릭스들)도 용이하게 통합될 수 있다.So far, the graph embedding method according to several other embodiments of the present disclosure has been described with reference to FIG. 10. According to the above, embedding representations (e.g., embedding matrices) of different sizes can be easily integrated through a resizing operation implemented based on a multi-layer perceptron, etc.

이하에서는, 도 11을 참조하여 본 개시의 또 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하도록 한다. 다만, 본 개시의 명료함을 위해, 앞선 실시예들과 중복되는 내용에 대한 설명은 생략하도록 한다.Hereinafter, a graph embedding method according to some other embodiments of the present disclosure will be described with reference to FIG. 11. However, for clarity of the present disclosure, description of content that overlaps with the previous embodiments will be omitted.

도 11에 도시된 바와 같이, 본 실시예는 K개(e.g., 3개 이상)의 임베딩 표현들(112-1 내지 112-k)을 애그리게이팅(통합)하여 타깃 그래프에 대한 통합 임베딩 표현(116)을 생성하는 방법에 관한 것이다. 도 11은 K개의 임베딩 표현들(112-1 내지 112-k)이 서로 다른 GNN들(111-1 내지 111-k)을 통해 생성된 임베딩 매트릭스인 경우를 가정하고 있으며, 사이즈 또한 서로 다른 경우를 가정하고 있다.As shown in Figure 11, this embodiment aggregates (integrates) K (e.g., 3 or more) embedding expressions 112-1 to 112-k to create an integrated embedding expression 116 for the target graph. ) is about how to create. Figure 11 assumes the case where K embedding representations (112-1 to 112-k) are embedding matrices generated through different GNNs (111-1 to 111-k), and the sizes are also different. I am assuming.

구체적으로, 임베딩 시스템(10)은 K개의 임베딩 표현들(112-1 내지 112-k)에 대해 리사이징 연산을 수행하고, 특정값들(113, 114)을 반영할 수 있다. 이때, 각각의 임베딩 표현(e.g., 112-2, 112-k)에 반영되는 특정값들(e.g., 113, 114)은 서로 다른 값(e.g., 무리수) 또는 서로 다른 학습가능 파라미터에 기반한 값일 수 있다. 예를 들어, 제2 임베딩 표현(112-2)에 반영되는 특정값(113)은 제1 무리수이고, 제K 임베딩 표현(112-k)에 반영되는 특정값(114)은 제2 무리수(즉, 제1 무리수와 다른 값)일 수 있다. 그러한 경우에, 애그리게이팅(e.g., 덧셈) 과정 중에 서로 다른 임베딩 표현들(112-1 내지 112-k)의 값이 상쇄되는 것이 효과적으로 방지될 수 있다.Specifically, the embedding system 10 may perform a resizing operation on the K embedding expressions 112-1 to 112-k and reflect the specific values 113 and 114. At this time, the specific values (e.g., 113, 114) reflected in each embedding expression (e.g., 112-2, 112-k) may be different values (e.g., irrational numbers) or values based on different learnable parameters. . For example, the specific value 113 reflected in the second embedding expression 112-2 is the first irrational number, and the specific value 114 reflected in the Kth embedding expression 112-k is the second irrational number (i.e. , a value different from the first irrational number). In such a case, the values of different embedding expressions 112-1 to 112-k can be effectively prevented from being offset during the aggregation (e.g., addition) process.

다음으로, 임베딩 시스템(10)은 풀링 연산을 수행하고, 풀링 연산의 결과(115-1 내지 115-k)를 애그리게이팅하여 통합 임베딩 표현(116)을 생성할 수 있다. 가령, 임베딩 시스템(10)은 풀링 연산의 결과로 얻어진 임베딩 벡터들(115-1 내지 115-k)에 대해 덧셈 연산을 수행하여 벡터 형태의 통합 임베딩 표현(116)을 생성할 수 있다.Next, the embedding system 10 may perform a pooling operation and aggregate the results 115-1 to 115-k of the pooling operation to generate a unified embedding representation 116. For example, the embedding system 10 may perform an addition operation on the embedding vectors 115-1 to 115-k obtained as a result of the pooling operation to generate an integrated embedding representation 116 in the form of a vector.

지금까지 도 11을 참조하여 본 개시의 또 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하였다. 이하에서는, 도 12 및 도 13을 참조하여 본 개시의 또 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하도록 한다. 다만, 본 개시의 명료함을 위해, 앞선 실시예들과 중복되는 내용에 대한 설명은 생략하도록 한다.So far, a graph embedding method according to some other embodiments of the present disclosure has been described with reference to FIG. 11. Hereinafter, a graph embedding method according to some other embodiments of the present disclosure will be described with reference to FIGS. 12 and 13. However, for clarity of the present disclosure, description of content that overlaps with the previous embodiments will be omitted.

도 12에 도시된 바와 같이, 본 실시예는 통합된 GNN을 이용하여 타깃 그래프(121)의 이웃 노드 정보와 토폴로지 정보가 함께 반영된 통합 임베딩 표현(125)을 생성하는 방법에 관한 것이다. 도 12에 예시된 통합 GNN의 구조는 두가지 방식(즉, 이웃 노드 정보 애그리게이팅 방식과 토폴로지 정보 추출 방식)의 GNN들(도 3 및 도 5에 예시된 GNN 30, 50 참조) 간에 이웃 노드 정보 애그리게이팅을 위한 모듈(즉, 신경망)이 공통된다는 점에 착안하여 도출된 것으로 이해될 수 있다. 도 12에 예시된 통합 GNN은 제1 GNN(e.g., 도 3의 GNN 30)과 제2 GNN(e.g., 도 5의 GNN 50)이 노드 정보 애그리게이팅 모듈(122)을 공유하고 있는 것과 같이 설명될 수도 있다.As shown in FIG. 12, this embodiment relates to a method of generating an integrated embedding representation 125 in which the neighboring node information and topology information of the target graph 121 are reflected together using an integrated GNN. The structure of the integrated GNN illustrated in FIG. 12 aggregates neighboring node information between GNNs (see GNNs 30 and 50 illustrated in FIGS. 3 and 5) of two methods (i.e., neighboring node information aggregating method and topology information extraction method). It can be understood that it was derived by focusing on the fact that the module for gating (i.e., neural network) is common. The integrated GNN illustrated in FIG. 12 will be described as a first GNN (e.g., GNN 30 in FIG. 3) and a second GNN (e.g., GNN 50 in FIG. 5) sharing the node information aggregating module 122. It may be possible.

구체적으로, 임베딩 시스템(10)은 이웃 노드 정보 애그리게이팅 모듈(122)을 통해 타깃 그래프(121)에 대한 제1 임베딩 표현(123 or 124)을 생성할 수 있다. 제1 임베딩 표현(123 or 124)은 3차원 매트릭스 형태(123 참조)일 수도 있고 2차원 매트릭스 형태(124 참조)일 수도 있다.Specifically, the embedding system 10 may generate a first embedding representation 123 or 124 for the target graph 121 through the neighboring node information aggregating module 122. The first embedding representation (123 or 124) may be in the form of a three-dimensional matrix (see 123) or a two-dimensional matrix (see 124).

다음으로, 임베딩 시스템(10)은 제1 임베딩 표현(123 or 124)을 분석하여 토폴로지 정보를 추출하고, 추출된 토폴로지 정보를 반영하여 제2 임베딩 표현(미도시)를 생성할 수 있다.Next, the embedding system 10 may analyze the first embedding expression 123 or 124 to extract topology information and generate a second embedding expression (not shown) by reflecting the extracted topology information.

다음으로, 임베딩 시스템(10)은 제1 임베딩 표현(123 or 124)과 제2 임베딩 표현(미도시)에 대해 리사이징, 특정값 반영, 풀링 등의 연산을 수행하고, 수행 결과를 애그리게이팅하여 통합 임베딩 표현(125)을 생성할 수 있다. 가령, 임베딩 시스템(10)은 벡터 형태의 통합 임베딩 표현(125)을 생성할 수 있다.Next, the embedding system 10 performs operations such as resizing, reflecting specific values, and pooling on the first embedding expression (123 or 124) and the second embedding expression (not shown), and integrates the performance results by aggregating them. An embedding representation 125 can be created. For example, the embedding system 10 can generate a unified embedding representation 125 in vector form.

보다 이해의 편의를 제공하기 위해, 도 13을 참조하여 부연 설명하도록 한다.In order to provide easier understanding, further explanation will be made with reference to FIG. 13.

도 13은 도 12의 통합 GNN에 대한 보다 자세한 예시를 도시하고 있다. 도 13에 도시된 통합 GNN은 예를 들어 도 4 및 도 6에 예시된 GNN들(40, 60)이 통합된 것으로 이해될 수 있다.Figure 13 shows a more detailed example of the integrated GNN of Figure 12. The integrated GNN shown in FIG. 13 may be understood as, for example, an integration of the GNNs 40 and 60 illustrated in FIGS. 4 and 6.

도 13에 도시된 바와 같이, 임베딩 시스템(10)은 이웃 노드 정보 애그리게이팅 모듈(132)을 통해 타깃 그래프에 대한 제1 임베딩 표현(133)을 생성할 수 있다. 도 13은 이웃 노드 정보 애그리게이팅 모듈(132)이 노드 튜플에 대한 피처 매트릭스(131, e.g., 노드의 개수가 'v'이고 피처의 차원수가 'd'인 경우, 사이즈가 v*v*d인 3차원 매트릭스)를 입력받아 노드 튜플에 대한 임베딩 매트릭스(133, e.g., 사이즈가 v*v*p인 3차원 매트릭스)를 출력(생성)하는 경우를 예시하고 있다.As shown in FIG. 13 , the embedding system 10 may generate a first embedding representation 133 for the target graph through the neighbor node information aggregating module 132. Figure 13 shows that the neighboring node information aggregating module 132 displays a feature matrix 131 for a node tuple (e.g., when the number of nodes is 'v' and the dimensionality of features is 'd', the size is v*v*d. This example illustrates a case where a 3D matrix) is input and an embedding matrix (133, e.g., 3D matrix of size v*v*p) for a node tuple is output (generated).

다음으로, 임베딩 시스템(10)은 제1 임베딩 표현(133)을 이용하여 지속 다이어그램을 계산함으로써 타깃 그래프의 토폴로지 정보가 반영된 제2 임베딩 표현(135)를 생성할 수 있다. 가령, 제1 임베딩 표현(133)이 3차원 임베딩 매트릭스인 경우, 임베딩 시스템(10)은 해당 임베딩 매트릭스(133)의 대각선 요소를 추출하여 2차원 임베딩 매트릭스(134)를 생성하고, 2차원 임베딩 매트릭스(134)에 대해 지속 다이어그램을 계산함으로써 제2 임베딩 표현(135)을 생성할 수 있다. 여기서, 대각선 요소를 추출하는 이유는 3차원 임베딩 매트릭스(133)의 대각선 요소들이 곧 각 노드의 정보가 집약된 부분을 의미하기 때문이다. 다만, 경우에 따라서는, 다른 방식으로 2차원 임베딩 매트릭스(134)가 생성될 수도 있다.Next, the embedding system 10 can generate a second embedding expression 135 that reflects the topology information of the target graph by calculating a persistence diagram using the first embedding expression 133. For example, when the first embedding representation 133 is a three-dimensional embedding matrix, the embedding system 10 extracts the diagonal elements of the embedding matrix 133 to generate a two-dimensional embedding matrix 134, and The second embedding representation (135) can be generated by computing the persistence diagram for (134). Here, the reason for extracting the diagonal elements is that the diagonal elements of the 3D embedding matrix 133 represent a part where the information of each node is concentrated. However, in some cases, the two-dimensional embedding matrix 134 may be generated in a different way.

다음으로, 임베딩 시스템(10)은 제1 임베딩 표현(133)과 제2 임베딩 표현(135)에 대해 리사이징, 특정값 반영, 풀링 등의 연산을 수행하고(136-1, 136-2, 137-1, 137-2 참조), 수행 결과(137-1, 137-2)를 애그리게이팅하여 통합 임베딩 표현(138)을 생성할 수 있다. 가령, 임베딩 시스템(10)은 벡터 형태의 통합 임베딩 표현(138)을 생성할 수 있다.Next, the embedding system 10 performs operations such as resizing, reflecting specific values, and pooling on the first embedding expression 133 and the second embedding expression 135 (136-1, 136-2, 137- 1 and 137-2), the performance results 137-1 and 137-2 can be aggregated to generate an integrated embedding expression 138. For example, the embedding system 10 can generate a unified embedding representation 138 in vector form.

지금까지 도 12 및 도 13을 참조하여 본 개시의 또 다른 몇몇 실시예들에 따른 그래프 임베딩 방법에 대하여 설명하였다. 상술한 바에 따르면, 통합 GNN을 통해 타깃 그래프의 노드 정보와 토폴로지 정보가 함께 반영된 통합 임베딩 표현이 용이하게 생성될 수 있으며, 이러한 통합 임베딩 표현을 이용하면 그래프에 관한 다양한 태스크들의 성능이 전반적으로 향상될 수 있다.So far, a graph embedding method according to some other embodiments of the present disclosure has been described with reference to FIGS. 12 and 13. According to the above, an integrated embedding expression that reflects the node information and topology information of the target graph can be easily created through the integrated GNN, and the overall performance of various tasks related to the graph can be improved by using this integrated embedding expression. You can.

이하에서는, 상술한 그래프 임베딩 방법(이하, '제안된 방법')에 대한 성능 실험 결과에 대하여 간략하게 소개하도록 한다.Below, we will briefly introduce the performance test results for the above-described graph embedding method (hereinafter referred to as the ‘proposed method’).

본 발명자들은 바이오인포매틱스(bioinformatics) 분야의 데이터셋(MUTAG, PTC, PROTEINS, NCI1)을 이용하여 그래프 분류 태스크의 정확도를 평가하는 실험을 실시하였다. 태스크의 정확도가 높다는 것은 곧 그래프 임베딩 방법의 성능이 우수하다는 것을 의미하기 때문이다. 구체적으로, 본 발명자들은 도 13에 예시된 바와 같은 통합 GNN을 통해 통합 임베딩 표현을 생성하고, 생성된 통합 임베딩 표현에 기초하여 입력된 그래프의 클래스를 예측하는 실험을 수행하였다. 또한, 본 발명자들은 이웃 노드 정보 애그리게이팅 방식의 대표적인 GNN인 'PPGN'에 대해서도 동일한 실험을 수행하였다. 실험 결과는 하기의 표 1에 기재되어 있다.The present inventors conducted an experiment to evaluate the accuracy of the graph classification task using datasets in the bioinformatics field (MUTAG, PTC, PROTEINS, NCI1). This is because high task accuracy means that the performance of the graph embedding method is excellent. Specifically, the present inventors performed an experiment to generate a unified embedding representation through an integrated GNN as illustrated in FIG. 13 and predict the class of the input graph based on the generated integrated embedding representation. Additionally, the present inventors performed the same experiment on 'PPGN', a representative GNN of the neighboring node information aggregating method. The experimental results are listed in Table 1 below.

구분division MUTAGMUTAG PTCPTC PROTEINSPROTEINS NCI1NCI1 PPGNPPGN 90.5590.55 66.1766.17 77.277.2 83.1983.19 제안된 방법proposed method 92.7892.78 70.8870.88 78.1178.11 84.6784.67

표 1을 참조하면, 데이터셋에 관계없이 제안된 방법의 성능이 PPGN보다 우수한 것을 확인할 수 있다. 이는 제안된 방법에 의해 생성된 통합 임베딩 표현은 노드 중심의 정보뿐만 아니라 그래프의 토폴로지 정보까지 포함하고 있기 때문인 것으로 판단되며, 이를 통해 제안된 방법이 이웃 노드 정보 애그리게이팅 방식의 GNN보다 강력한 임베딩 표현을 생성할 수 있다는 것을 알 수 있다. 뿐만 아니라, 통합 임베딩 표현을 생성하는 과정 중에 정보(또는 표현력)의 손실이 거의 발생하지 않는다는 것도 알 수 있다.Referring to Table 1, it can be seen that the performance of the proposed method is better than PPGN regardless of the dataset. This is believed to be because the integrated embedding representation generated by the proposed method includes not only node-centered information but also graph topology information, and through this, the proposed method provides a more powerful embedding representation than the GNN of the neighboring node information aggregating method. You can see that it can be created. In addition, it can be seen that almost no loss of information (or expressive power) occurs during the process of generating the integrated embedding representation.

또한, 본 발명자들은 바이오인포매틱스 분야의 QM9(Quantum Machines 9) 데이터셋을 이용하여 회귀(regression) 태스크의 정확도를 평가하는 실험도 실시하였다. 구체적으로, 본 발명자들은 도 13에 예시된 바와 같은 통합 GNN을 통해 통합 임베딩 표현을 생성하고, 생성된 통합 임베딩 표현에 기초하여 하기의 표 2에 기재된 타깃들의 값을 예측하며, 예측값에 대한 평균 절대 오차를 측정하는 실험을 수행하였다. 또한, 'PPGN'에 대해서도 동일한 실험이 수행되었다. 실험 결과는 하기의 표 2에 기재되어 있다.In addition, the present inventors also conducted an experiment to evaluate the accuracy of the regression task using the QM9 (Quantum Machines 9) dataset in the bioinformatics field. Specifically, the present inventors generate a unified embedding representation through an integrated GNN as illustrated in Figure 13, predict the values of the targets listed in Table 2 below based on the generated integrated embedding expression, and average absolute values for the predicted values. An experiment was performed to measure the error. Additionally, the same experiment was performed for 'PPGN'. The experimental results are listed in Table 2 below.

타깃target PPGNPPGN 제안된 방법proposed method Dipole momentDipole moment 0.2310.231 0.08440.0844 Isotropic polarizabilityIsotropic polarizability 0.3820.382 0.01810.0181 HOMO(Highest occupied molecular orbital energy)HOMO(Highest occupied molecular orbital energy) 0.002870.00287 0.001690.00169 LUMO(Lowest unoccupied molecular orbital energy)Lowest unoccupied molecular orbital energy (LUMO) 0.003090.00309 0.00180.0018

표 2를 참조하면, 회귀 태스크에 대해서도 제안된 방법의 성능이 PPGN보다 우수한 것을 확인할 수 있다. 이를 통해 제안된 방법을 이용하면, 그래프와 연관된 다양한 태스크들의 성능이 전반적으로 향상될 수 있다는 것을 알 수 있다.Referring to Table 2, it can be seen that the performance of the proposed method is better than PPGN even for regression tasks. This shows that the overall performance of various tasks related to graphs can be improved by using the proposed method.

지금까지 본 개시의 몇몇 실시예들에 따른 그래프 임베딩 방법에 대한 성능 실험 결과에 대하여 간략하게 소개하였다. 이하에서는, 도 14를 참조하여 상술한 임베딩 시스템(10)을 구현할 수 있는 예시적인 컴퓨팅 장치(140)에 대하여 설명하도록 한다.So far, we have briefly introduced the performance test results for the graph embedding method according to some embodiments of the present disclosure. Hereinafter, an exemplary computing device 140 capable of implementing the embedding system 10 described above with reference to FIG. 14 will be described.

도 14는 컴퓨팅 장치(140)를 나타내는 예시적인 하드웨어 구성도이다.14 is an exemplary hardware configuration diagram showing the computing device 140.

도 14에 도시된 바와 같이, 컴퓨팅 장치(140)는 하나 이상의 프로세서(141), 버스(143), 통신 인터페이스(144), 프로세서(141)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(142)와, 컴퓨터 프로그램(146)을 저장하는 스토리지(145)를 포함할 수 있다. 다만, 도 14에는 본 개시의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 개시가 속한 기술분야의 통상의 기술자라면 도 14에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 알 수 있다. 즉, 컴퓨팅 장치(140)에는 도 14에 도시된 구성요소 이외에도 다양한 구성요소가 더 포함될 수 있다. 또한, 경우에 따라, 도 14에 도시된 구성요소들 중 일부가 생략된 형태로 컴퓨팅 장치(140)가 구성될 수도 있다. 이하, 컴퓨팅 장치(140)의 각 구성요소에 대하여 설명한다.As shown in FIG. 14, the computing device 140 includes one or more processors 141, a bus 143, a communication interface 144, and a memory (loading) a computer program executed by the processor 141. 142) and a storage 145 for storing a computer program 146. However, only components related to the embodiment of the present disclosure are shown in FIG. 14. Accordingly, a person skilled in the art to which this disclosure pertains can recognize that other general-purpose components may be included in addition to the components shown in FIG. 14 . That is, the computing device 140 may further include various components in addition to those shown in FIG. 14 . Additionally, in some cases, the computing device 140 may be configured with some of the components shown in FIG. 14 omitted. Hereinafter, each component of the computing device 140 will be described.

프로세서(141)는 컴퓨팅 장치(140)의 각 구성의 전반적인 동작을 제어할 수 있다. 프로세서(141)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 개시의 기술 분야에 잘 알려진 임의의 형태의 프로세서 중 적어도 하나를 포함하여 구성될 수 있다. 또한, 프로세서(141)는 본 개시의 다양한 실시예들에 따른 동작/방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(140)는 하나 이상의 프로세서를 구비할 수 있다.The processor 141 may control the overall operation of each component of the computing device 140. The processor 141 is at least one of a Central Processing Unit (CPU), Micro Processor Unit (MPU), Micro Controller Unit (MCU), Graphic Processing Unit (GPU), or any type of processor well known in the art of the present disclosure. It can be configured to include. Additionally, the processor 141 may perform operations on at least one application or program to execute operations/methods according to various embodiments of the present disclosure. Computing device 140 may include one or more processors.

다음으로, 메모리(142)는 각종 데이터, 명령 및/또는 정보를 저장할 수 있다. 메모리(142)는 본 개시의 다양한 실시예들에 따른 동작/방법을 실행하기 위하여 스토리지(145)로부터 컴퓨터 프로그램(146)을 로드할 수 있다. 메모리(142)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 개시의 기술적 범위가 이에 한정되는 것은 아니다.Next, memory 142 may store various data, commands and/or information. Memory 142 may load a computer program 146 from storage 145 to execute operations/methods according to various embodiments of the present disclosure. The memory 142 may be implemented as a volatile memory such as RAM, but the technical scope of the present disclosure is not limited thereto.

다음으로, 버스(143)는 컴퓨팅 장치(140)의 구성요소들 간 통신 기능을 제공할 수 있다. 버스(143)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.Next, bus 143 may provide communication functionality between components of computing device 140. The bus 143 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

다음으로, 통신 인터페이스(144)는 컴퓨팅 장치(140)의 유무선 인터넷 통신을 지원할 수 있다. 또한, 통신 인터페이스(144)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(144)는 본 개시의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.Next, the communication interface 144 may support wired and wireless Internet communication of the computing device 140. Additionally, the communication interface 144 may support various communication methods other than Internet communication. To this end, the communication interface 144 may be configured to include a communication module well known in the technical field of the present disclosure.

다음으로, 스토리지(145)는 하나 이상의 컴퓨터 프로그램(146)을 비임시적으로 저장할 수 있다. 스토리지(145)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 판독가능한 기록매체를 포함하여 구성될 수 있다.Next, storage 145 may non-transitory store one or more computer programs 146. The storage 145 may be a non-volatile memory such as Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, a hard disk, a removable disk, or a device well known in the art to which this disclosure pertains. It may be configured to include any known type of computer-readable recording medium.

다음으로, 컴퓨터 프로그램(146)은 메모리(142)에 로드될 때 프로세서(141)로 하여금 본 개시의 다양한 실시예들에 따른 동작/방법을 수행하도록 하는 하나 이상의 인스트럭션들(instructions)을 포함할 수 있다. 즉, 프로세서(141)는 로드된 인스트럭션들을 실행함으로써, 본 개시의 다양한 실시예들에 따른 동작/방법을 수행할 수 있다.Next, the computer program 146 may include one or more instructions that, when loaded into the memory 142, cause the processor 141 to perform operations/methods according to various embodiments of the present disclosure. there is. That is, the processor 141 may perform operations/methods according to various embodiments of the present disclosure by executing loaded instructions.

예를 들어, 컴퓨터 프로그램(146)은 타깃 그래프에 대한 제1 임베딩 표현과 제2 임베딩 표현을 획득하는 동작, 제2 임베딩 표현에 특정값을 반영하여 제2 임베딩 표현의 값을 변경하는 동작 및 제1 임베딩 표현과 변경된 제2 임베딩 표현을 애그리게이팅하여 타깃 그래프에 대한 통합 임베딩 표현을 생성하는 동작을 수행하도록 하는 하나 이상의 인스트럭션들을 포함할 수 있다. 이와 같은 경우, 컴퓨팅 장치(140)를 통해 본 개시의 몇몇 실시예들에 따른 임베딩 시스템(10)이 구현될 수 있다.For example, the computer program 146 may include an operation of obtaining a first embedding expression and a second embedding expression for a target graph, an operation of changing the value of the second embedding expression by reflecting a specific value in the second embedding expression, and a second embedding expression. It may include one or more instructions for performing an operation of generating a unified embedding expression for the target graph by aggregating the first embedding expression and the changed second embedding expression. In this case, the embedding system 10 according to some embodiments of the present disclosure may be implemented through the computing device 140.

한편, 몇몇 실시예들에서, 도 14에 도시된 컴퓨팅 장치(140)는 클라우드 기술에 기반하여 구현된 가상 머신을 의미하는 것일 수도 있다. 가령, 컴퓨팅 장치(140)는 서버 팜(server farm)에 포함된 하나 이상의 물리 서버(physical server)에서 동작하는 가상 머신일 수 있다. 이 경우, 도 14에 도시된 프로세서(141), 메모리(142) 및 스토리지(145) 중 적어도 일부는 가상 하드웨어(virtual hardware)일 수 있으며, 통신 인터페이스(144) 또한 가상 스위치(virtual switch) 등과 같은 가상화된 네트워킹 요소로 구현되는 것일 수 있다.Meanwhile, in some embodiments, the computing device 140 shown in FIG. 14 may mean a virtual machine implemented based on cloud technology. For example, the computing device 140 may be a virtual machine running on one or more physical servers included in a server farm. In this case, at least some of the processor 141, memory 142, and storage 145 shown in FIG. 14 may be virtual hardware, and the communication interface 144 may also be a virtual switch, etc. It may be implemented as a virtualized networking element.

지금까지 도 14를 참조하여 본 개시의 몇몇 실시예들에 따른 임베딩 시스템(10)을 구현할 수 있는 예시적인 컴퓨팅 장치(140)에 대하여 설명하였다.So far, an exemplary computing device 140 capable of implementing the embedding system 10 according to some embodiments of the present disclosure has been described with reference to FIG. 14 .

지금까지 도 1 내지 도 14를 참조하여 본 개시의 다양한 실시예들 및 그 실시예들에 따른 효과들을 언급하였다. 본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.So far, various embodiments of the present disclosure and effects according to the embodiments have been mentioned with reference to FIGS. 1 to 14 . The effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below.

또한, 이상의 실시예들에서 복수의 구성요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 개시의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 개시의 기술적 사상의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In addition, although it has been described in the above embodiments that a plurality of components are combined or operated in combination, the technical idea of the present disclosure is not necessarily limited to these embodiments. That is, as long as it is within the scope of the technical idea of the present disclosure, all of the components may be operated by selectively combining one or more of them.

지금까지 설명된 본 개시의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 컴퓨터로 읽을 수 있는 기록 매체에 기록된 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical ideas of the present disclosure described so far can be implemented as computer-readable code on a computer-readable medium. A computer program recorded on a computer-readable recording medium can be transmitted to another computing device through a network such as the Internet, installed on the other computing device, and thus used on the other computing device.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 이상 첨부된 도면을 참조하여 본 개시의 다양한 실시예들을 설명하였지만, 본 개시가 속한 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 개시의 기술적 사상이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although operations are shown in the drawings in a specific order, it should not be understood that the operations must be performed in the specific order shown or sequential order or that all illustrated operations must be performed to obtain the desired results. In certain situations, multitasking and parallel processing may be advantageous. Although various embodiments of the present disclosure have been described above with reference to the attached drawings, those skilled in the art will understand that the technical idea of the present disclosure can be translated into another specific form without changing the technical idea or essential features. It is understandable that it can also be implemented. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. The scope of protection of this disclosure should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of the technical ideas defined by this disclosure.

Claims

In a method performed by at least one computing device,
Obtaining a first embedding representation and a second embedding representation for the target graph;
changing the value of the second embedding expression by reflecting a specific value in the second embedding expression; and
Generating a unified embedding representation for the target graph by aggregating the first embedding representation and the changed second embedding representation.
Graph embedding method.

According to paragraph 1,
One of the first embedding expression and the second embedding expression is generated by an embedding method that aggregates information of neighboring nodes constituting the target graph,
The other one of the first embedding expression and the second embedding expression is generated by an embedding method that reflects topology information of the target graph,
Graph embedding method.

According to paragraph 1,
The first embedding expression and the second embedding expression are generated by embedding the target graph through different GNN (Graph Neural Network),
Graph embedding method.

According to paragraph 1,
The specific value is an irrational number,
Graph embedding method.

According to paragraph 1,
The specific value is a value based on a learnable parameter,
predicting a label for a predetermined task based on the generated integrated embedding representation; and
Further comprising updating the value of the learnable parameter based on the difference between the predicted label and the correct label for the target graph,
Graph embedding method.

According to paragraph 1,
Reflecting the specific value in the second embedding representation is performed based on a multiplication operation,
Aggregating the first embedding representation and the modified second embedding representation is performed based on an addition operation,
Graph embedding method.

According to paragraph 1,
Obtaining the first and second embedding representations includes:
Obtaining a first embedding matrix and a second embedding matrix for the target graph, wherein the size of the first embedding matrix is different from the second embedding matrix; and
Obtaining the first embedding representation and the second embedding representation by performing a resizing operation on at least one of the first embedding matrix and the second embedding matrix,
Graph embedding method.

In clause 7,
The resizing operation is implemented through a multi-layer perceptron,
Graph embedding method.

In clause 7,
The step of generating the integrated embedding representation is,
generating a first embedding vector by performing a pooling operation on the first embedding representation;
generating a second embedding vector by performing a pooling operation on the changed second embedding representation, wherein the second embedding vector has the same dimensionality as the first embedding vector; and
Aggregating the first and second embedding vectors to generate the unified embedding representation in vector form,
Graph embedding method.

According to paragraph 1,
The step of obtaining the first and second embedding representations includes:
Obtaining the first embedding representation through a Graph Neural Network (GNN) using a neighboring node information aggregation method; and
Generating the second embedding representation by extracting topology information of the target graph using the first embedding representation,
Graph embedding method.

According to clause 10,
The first embedding representation is a three-dimensional embedding matrix generated by aggregating feature matrices for node tuples,
The step of generating the second embedding representation is,
generating a two-dimensional embedding matrix by extracting diagonal elements of the three-dimensional embedding matrix; and
Comprising the step of extracting topology information of the target graph by analyzing the two-dimensional embedding matrix,
Graph embedding method.

According to clause 10,
Extracting the topology information of the target graph is performed by calculating a persistence diagram,
Graph embedding method.

According to paragraph 1,
The step of generating the integrated embedding representation is,
performing a pooling operation on each of the first embedding representation and the changed embedding representation; and
Aggregating results of the pooling operation to generate the unified embedding representation,
Graph embedding method.

According to paragraph 1,
The step of generating the integrated embedding representation is,
Obtaining a third embedding representation for the target graph;
changing the value of the third embedding expression by reflecting a value different from the specific value in the third embedding expression; and
Aggregating the first embedding representation, the modified second embedding representation, and the modified third embedding representation to generate the unified embedding representation,
Graph embedding method.

According to paragraph 1,
The step of generating the integrated embedding representation is,
Obtaining third to Kth embedding representations for the target graph (where K is a natural number of 3 or more);
changing the value of each embedding expression by reflecting a value different from the specific value in the third to Kth embedding expressions; and
Generating the integrated embedding representation by aggregating the first embedding representation and the changed second to Kth embedding representations,
Graph embedding method.

One or more processors; and
Includes memory for storing one or more instructions,
The one or more processors:
By executing one or more instructions stored above,
Obtaining a first embedding representation and a second embedding representation for the target graph,
An operation of changing the value of the second embedding expression by reflecting a specific value in the second embedding expression, and
Performing an operation of generating a unified embedding expression for the target graph by aggregating the first embedding expression and the changed second embedding expression.
Graph embedding system.

Combined with a computing device,
Obtaining a first embedding representation and a second embedding representation for the target graph;
changing the value of the second embedding expression by reflecting a specific value in the second embedding expression; and
Stored in a computer-readable recording medium to execute the step of aggregating the first embedding representation and the changed second embedding representation to generate a unified embedding representation for the target graph,
computer program.