KR20190105358A

KR20190105358A - System and method for community detection of partially observed networks

Info

Publication number: KR20190105358A
Application number: KR1020180025799A
Authority: KR
Inventors: 신원용; 트란콩
Original assignee: 단국대학교 산학협력단
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2019-09-17
Also published as: KR102024819B1

Abstract

The present invention relates to a system for detecting a community, which restores a representative node among omitted nodes even if a part of nodes or edges is omitted when collecting data in a network environment, where a plurality of nodes are interconnected to form a community, to reliably detect the community, and a method thereof. According to the present invention, the system comprises: an input unit receiving an observable node and an edge of the observable node in a network environment in which a plurality of nodes are interconnected to form a community; a restoration unit restoring an omitted node by using the observable node and the edge of the observable node; a selection unit selecting an influential node from the omitted nodes by using a preset logic; and an estimation unit estimating a community in the network environment by using the observable node and the influential node.

Description

SYSTEM AND METHOD FOR COMMUNITY DETECTION OF PARTIALLY OBSERVED NETWORKS}

본 발명은 부분 관찰되는 네트워크의 커뮤니티 검출 시스템 및 방법에 관한 것으로서, 보다 상세하게는 다수의 노드가 서로 연결되어 커뮤니티를 구성하는 네트워크 환경을 대상으로, 데이터 수집 시 노드 또는 엣지 중 일부가 누락되어 있더라도 누락된 노드 중 대표되는 노드를 복구하는 것으로 보다 신뢰성 있게 커뮤니티를 검출할 수 있는 시스템 및 방법에 관한 기술이다.The present invention relates to a community detection system and method of a partially observed network, and more particularly, to a network environment in which a plurality of nodes are connected to each other to form a community, even if some of the nodes or edges are missing during data collection. The present invention relates to a system and method for detecting a community more reliably by recovering a representative node among missing nodes.

현대의 사회적 관계는 점차 복잡해지고 있으며, 사람들은 복잡한 사회적 관계에서 여러 가지 역할을 수행하고 있다. 이는 사람들이 여러 가지 형태의 관계로 얽혀있다는 것을 의미한다. 예를 들어, 사람은 직장 동료 관계, 공동 저자 관계, 페이스북 친구 관계, 동호회원 관계, 정치 모임 관계, 가족 관계 등의 다양한 측면의 커뮤니티를 맺고 살아간다.Modern social relations are becoming increasingly complex, and people play many roles in complex social relations. This means that people are intertwined in many forms of relationships. For example, a person lives in various aspects of the community, including co-workers, co-authors, Facebook friends, fellowships, political gatherings, and family relationships.

그리고, 이러한 다양한 측면의 커뮤니티들은 일반적으로 다층 그래프(multi-layer graph)를 통해 모델링되고, 이때 각층에 해당하는 그래프는 여러 측면의 관계들 중 한 가지 측면의 관계를 나타낸다.In addition, these various aspects of the community are generally modeled through a multi-layer graph, where a graph corresponding to each layer represents a relationship of one of the aspects of the various aspects.

도 1을 참조하면, 네트워크 환경의 커뮤니티 검출과 관련하여, 종래기술1은 입력되는 데이터가 네트워크 환경 내 포함된 모든 노드 및 모든 노드의 에지를 포함하는 것을 전제로 하기 때문에 노드 및 엣지를 부분적으로만 관찰 가능한 네트워크 환경에서는 커뮤니티 검출 성능에 열화가 발생하게 된다.Referring to FIG. 1, in relation to community detection of a network environment, the prior art 1 partially and only partially nodes and edges because it assumes that the input data includes all nodes and edges of all nodes included in the network environment. In an observable network environment, degradation of community detection performance occurs.

한편, 종래기술1의 문제를 해결하기 위해 누락된 노드들을 복구하는 기술이 제안되었다. 도 2를 참조하면, 종래기술2는 종래기술1의 문제를 해결하기 위해 가시 노드를 바탕으로 누락된 노드를 추론한 후 가시 노드 및 모든 누락된 노드를 바탕으로 커뮤니티 검출을 실시한다. 그러나, 종래기술2와 같이 네트워크 환경 내 누락된 모든 노드를 복구하는 것은 추론 오차 때문에 오히려 성능 개선을 가져오지 못하여 종래기술1보다 더 나은 커뮤니티 검출을 수행할 수 없었다.Meanwhile, in order to solve the problem of the related art 1, a technique for recovering missing nodes has been proposed. Referring to FIG. 2, in order to solve the problem of the related art 1, the related art 2 infers missing nodes based on the visible nodes and then performs community detection based on the visible nodes and all the missing nodes. However, recovering all the missing nodes in the network environment, as in the prior art 2, could not perform better community detection than the prior art 1 due to inference errors.

공개특허공보 제10-2017-0111268호Patent Publication No. 10-2017-0111268

이에 본 발명은 상기와 같은 종래의 제반 문제점을 해소하기 위해 제안된 것으로, 본 발명의 목적은 다수의 노드가 서로 연결되어 커뮤니티를 구성하는 네트워크 환경을 대상으로, 데이터 수집 시 노드 또는 엣지 중 일부가 누락되어 있더라도 누락된 노드 중 대표되는 노드를 복구하는 것으로 보다 신뢰성 있게 커뮤니티를 검출할 수 있는 시스템 및 방법을 제공하기 위한 것이다.Accordingly, the present invention has been proposed to solve the conventional problems as described above, and an object of the present invention is for a network environment in which a plurality of nodes are connected to each other to form a community, and some of the nodes or edges are collected during data collection. It is to provide a system and method that can detect a community more reliably by recovering a representative node among the missing nodes even if they are missing.

상기와 같은 목적을 달성하기 위하여 본 발명의 기술적 사상에 의한 부분 관찰되는 네트워크의 커뮤니티 검출 시스템은, 다수의 노드(node)가 네트워크를 이루어 커뮤니티를 형성하는 네트워크 환경에서의 가시 노드 및 상기 가시 노드의 엣지(edge)를 입력받는 입력부와, 상기 가시 노드 및 상기 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 복구부와, 기 설정된 로직을 이용하여 상기 누락된 노드 중에서 주요 노드를 선별하는 선별부와, 상기 가시 노드 및 상기 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정하는 추정부를 포함하는 것을 특징으로 한다.In order to achieve the above object, a community detection system of a network, which is partially observed according to the technical spirit of the present invention, includes a visible node and a visible node in a network environment in which a plurality of nodes form a network to form a community. An input unit for receiving an edge, a recovery unit for recovering the missing node using the visible node and the edge of the visible node, and a selecting unit for selecting a main node among the missing nodes using preset logic And an estimator for estimating a community in a network environment using the visible node and the main node.

또한, 상기 복구부는 상기 가시 노드 및 상기 누락된 노드의 수에 대응되는 행렬을 생성하는 것을 특징으로 할 수 있다.In addition, the recovery unit may generate a matrix corresponding to the number of visible nodes and the missing nodes.

또한, 상기 기 설정된 로직은 상기 행렬 중 상기 누락된 노드에 해당되는 행 또는 열의 성분들을 합산한 후, 합산된 값이 기 설정된 기준 값 이상인 누락된 노드를 상기 주요 노드로 선별하는 것을 특징으로 할 수 있다.The preset logic may add the components of the row or column corresponding to the missing node in the matrix, and then select the missing node whose summed value is greater than or equal to a predetermined reference value as the main node. have.

또한, 상기 기 설정된 로직은 상기 행렬 중 상기 누락된 노드에 해당되는 영역을 대상으로 중심성(centrality)을 분석한 후, 기 설정된 범위 내에 포함되는 누락된 노드를 상기 주요 노드로 선별하는 것을 특징으로 할 수 있다.The preset logic may be configured to analyze a centrality of a region corresponding to the missing node in the matrix, and then select the missing node included in the preset range as the main node. Can be.

또한, 상기 복구부는 크로네커(kronecker) 모델 및 기대-최대화 알고리즘(expectation-maximization algorithm)을 이용하여 상기 행렬을 생성하는 것을 특징으로 할 수 있다.In addition, the recovery unit may generate the matrix using a kronecker model and an expectation-maximization algorithm.

또한, 상기 복구부는 상기 가시 노드 및 상기 가시 노드의 엣지만을 이용하여 상기 가시 노드들의 연결 관계가 반영된 가시 행렬을 생성하고, 상기 가시 행렬과 기대-최대화 알고리즘을 이용하여 생성 파라미터 행렬 및 노드 순열 값을 획득하고, 상기 생성 파라미터 행렬을 복수 회 제곱하는 것으로 확률적 인접 행렬을 생성한 후, 상기 확률적 인접 행렬에 상기 가시 행렬의 정보를 반영하며, 상기 확률적 인접 행렬 중 상기 누락된 노드에 해당되는 영역에서 베르누이 시행을 실시하는 것으로 성공 사건 및 실패 사건을 도출하고, 상기 성공 사건에 해당되는 성분에는 1을 반영하고, 상기 실패 사건에 해당되는 성분에는 0을 반영하는 것으로 복구 행렬을 생성하는 것을 특징으로 할 수 있다.In addition, the recovery unit generates a visible matrix reflecting the connection relationship between the visible nodes using only the visible node and the edges of the visible node, and generates a generation parameter matrix and a node permutation value using the visible matrix and an expectation-maximization algorithm. And generating a probabilistic neighbor matrix by multiplying the generation parameter matrix a plurality of times, and reflecting information of the visible matrix in the probabilistic neighbor matrix, and corresponding to the missing node of the probabilistic neighbor matrix. Performing Bernoulli enforcement in the domain derives success and failure events, generates a recovery matrix by reflecting 1 in the component corresponding to the success event, and 0 in the component corresponding to the failure event. You can do

또한, 상기 추정부는 비-음영 행렬 인수분해(NMF, Non-negative Matrix Factorization)를 이용하여 커뮤니티를 추정하는 것을 특징으로 할 수 있다.In addition, the estimator may estimate the community using non-negative matrix factorization (NMF).

한편, 상기와 같은 목적을 달성하기 위하여 본 발명의 기술적 사상에 의한 부분 관찰되는 네트워크의 커뮤니티 검출 방법은, 입력부가 다수의 노드(node)가 네트워크를 이루어 커뮤니티를 형성하는 네트워크 환경에서의 가시 노드 및 상기 가시 노드의 엣지(edge)를 입력받는 단계와, 복구부가 상기 가시 노드 및 상기 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 단계와, 선별부가 기 설정된 로직을 이용하여 상기 누락된 노드 중에서 주요 노드를 선별하는 단계와, 추정부가 상기 가시 노드 및 상기 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정하는 단계를 포함하는 것을 특징으로 한다.On the other hand, in order to achieve the above object, the community detection method of the network partially observed by the technical idea of the present invention, a visible node in a network environment where the input unit forms a community by forming a plurality of nodes (network) and Receiving an edge of the visible node; recovering a missing node using the visible node and an edge of the visible node; and selecting a selector from among the missing nodes using preset logic; Selecting a main node, and estimating unit using the visible node and the main node to estimate a community in a network environment.

또한, 상기 복구부가 상기 가시 노드 및 상기 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 단계에서는, 상기 가시 노드 및 상기 누락된 노드의 수에 대응되는 행렬이 생성되는 것을 특징으로 할 수 있다.Further, when the recovery unit recovers the missing node by using the visible node and the edge of the visible node, a matrix corresponding to the number of the visible node and the missing node may be generated.

또한, 상기 행렬은 상기 복구부가 크로네커(kronecker) 모델 및 기대-최대화 알고리즘(expectation-maximization algorithm)을 이용하여 생성하는 것을 특징으로 할 수 있다.In addition, the matrix may be generated by the recovery unit using a kronecker model and an expectation-maximization algorithm.

또한, 상기 복구부가 상기 가시 노드 및 상기 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 단계는 상기 가시 노드 및 상기 가시 노드의 엣지를 이용하여 상기 가시 노드들의 연결 관계가 반영된 가시 행렬을 생성하는 단계; 상기 가시 행렬과 기대-최대화 알고리즘을 이용하여 생성 파라미터 행렬 및 노드 순열 값을 획득하는 단계와; 상기 생성 파라미터 행렬을 복수 회 제곱하는 것으로 확률적 인접 행렬을 생성하는 단계와; 상기 확률적 인접 행렬에 상기 가시 행렬의 정보를 반영하는 단계와; 상기 확률적 인접 행렬 중 상기 누락된 노드에 해당되는 영역에서 베르누이 시행을 실시하는 것으로 성공 사건 및 실패 사건을 도출하는 단계와; 상기 성공 사건에 해당되는 성분에는 1을 반영하고, 상기 실패 사건에 해당되는 성분에는 0을 반영하는 것으로 복구 행렬을 생성하는 단계를 포함하는 것을 특징으로 할 수 있다.In addition, the recovering unit recovers the missing node by using the visible node and the edge of the visible node by using the edge of the visible node and the visible node to generate a visible matrix reflecting the connection relationship of the visible nodes step; Obtaining a generation parameter matrix and node permutation values using the visible matrix and the expectation-maximization algorithm; Generating a probabilistic neighbor matrix by squaring the generation parameter matrix a plurality of times; Reflecting information of the visible matrix in the probabilistic neighbor matrix; Deriving a success event and a failure event by performing a Bernoulli enforcement in a region corresponding to the missing node of the probabilistic neighbor matrix; The method may include generating a recovery matrix by reflecting 1 to a component corresponding to the success event and 0 to a component corresponding to the failure event.

또한, 상기 추정부가 상기 가시 노드 및 상기 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정하는 단계는 비-음영 행렬 인수분해(NMF, Non-negative Matrix Factorization)를 이용하여 커뮤니티를 추정하는 것을 특징으로 할 수 있다.In the estimating unit, the estimating community in a network environment using the visible node and the main node may include estimating the community using non-negative matrix factorization (NMF). Can be.

본 발명에 의한 위상 부분 관찰되는 네트워크의 커뮤니티 검출 시스템 및 방법에 따르면,According to the system and method for community detection of a network in which the phase portion is observed according to the present invention,

첫째, 본 발명은 개인 데이터 일부 노출, 일부 데이터만이 샘플링된 그래프 등 부분적으로 관찰이 가능한 다양한 네트워크 환경에서의 커뮤니티 검출에 적용될 수 있다. 예를 들어, 생물학적, 사회적, 기술적 네트워크를 포함한 다양한 복잡계 시스템에 적용 가능하다.First, the present invention can be applied to community detection in various network environments where partial personal data exposure and partial data only are partially observed. For example, it can be applied to various complex systems including biological, social and technical networks.

둘째, 누락된 노드 중 다수의 엣지가 있는 노드만을 선택적으로 이용하므로 종래보다 원형에 가까운 커뮤니티 검출이 가능하게 된다.Second, since only nodes having a plurality of edges among the missing nodes are selectively used, community detection that is closer to a circle than the conventional one is possible.

셋째, 가시 노드만을 이용하여 커뮤니티를 검출하는 방식과, 가시 노드 및 모든 누락된 노드를 이용하여 커뮤니티를 검출하는 방식과 일부 과정이 유사하므로 종래의 시스템을 대체하는 것으로 넓은 응용이 가능하다.Third, since a process of detecting a community using only the visible node and a method of detecting a community using the visible node and all the missing nodes are similar, a wide application is possible by replacing the conventional system.

넷째, 부분적 데이터만을 가지고 필요한 정보를 추론하는 환경에 본 발명을 적용하는 것으로 활용이 가능하다.Fourth, the present invention can be utilized by applying the present invention to an environment that infers necessary information using only partial data.

도 1은 가시 노드만을 이용하여 커뮤니티를 검출하는 종래기술1의 프로세스를 나타내는 도면.
도 2는 가시 노드 및 복구된 모든 누락된 노드를 이용하여 커뮤니티를 검출하는 종래기술2의 프로세스를 나타내는 도면.
도 3은 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템의 구성도.
도 4는 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템 및 방법이 주요 노드를 도출한 후 커뮤니티를 검출하는 프로세스를 나타내는 도면.
도 5는 본 발명의 일 실시예에 따른 복구부가 복구 행렬을 생성하는 과정을 나타낸 예시 도면.
도 6은 본 발명의 일 실시예에 따른 선별부가 복구 행렬을 이용하여 주요 노드를 선별하는 과정을 나타낸 예시 도면.
도 7은 본 발명의 일 실시예에 따른 추정부를 통해 커뮤니티와 노드의 관계를 행렬로 나타낸 도면.
도 8은 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 방법의 순서도.
도 9는 본 발명의 일 실시예에 따른 S120 단계의 세부 과정을 나타낸 순서도.
도 10 및 도 11은 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템 및 방법과, 종래기술1, 종래기술2를 동일한 조건 내에서 시험하였을 때의 성능 결과 비교 그래프.1 is a diagram illustrating a prior art process for detecting a community using only visible nodes.
FIG. 2 shows a process of prior art 2 for detecting a community using a visible node and all missing missing nodes. FIG.
3 is a block diagram of a community detection system of a partially observed network according to an embodiment of the present invention.
4 is a diagram illustrating a process of detecting a community after deriving a main node by a system and method for community detection of a partially observed network according to an embodiment of the present invention.
5 is an exemplary view illustrating a process of a recovery unit generating a recovery matrix according to an embodiment of the present invention.
6 is an exemplary diagram illustrating a process of selecting a main node using a recovery matrix according to an embodiment of the present invention.
7 is a diagram illustrating a relationship between a community and a node through a estimator according to an embodiment of the present invention.
8 is a flow chart of a community detection method of a partially observed network according to an embodiment of the present invention.
9 is a flowchart illustrating a detailed process of step S120 according to an embodiment of the present invention.
10 and 11 are graphs comparing the results of a community detection system and method of a partially observed network according to an embodiment of the present invention and performances when the prior arts 1 and 2 are tested under the same conditions.

첨부한 도면을 참조하여 본 발명의 실시예들에 의한 부분 관찰되는 네트워크의 커뮤니티 검출 시스템 및 방법에 대하여 상세히 설명한다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.With reference to the accompanying drawings will be described in detail with respect to the community detection system and method of the network partially observed according to embodiments of the present invention. As the inventive concept allows for various changes and numerous modifications, particular embodiments will be illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to the specific form disclosed, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

또한, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Also, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템(100)은 컴퓨터, 서버와 같은 컴퓨팅장치에 포함되어 그 기능을 수행하는 시스템으로서, 단일 컴퓨팅장치에 본 발명의 일 실시예가 포함되거나, 복수의 컴퓨팅장치가 통신 네트워크를 이용하여 유기적으로 연결된 환경에서 본 발명의 일 실시예가 상기 복수의 컴퓨팅 장치에 분산되는 것으로 실시될 수 있다.The community detection system 100 of a partially observed network according to an embodiment of the present invention is a system included in a computing device such as a computer and a server to perform a function, and an embodiment of the present invention is included in a single computing device or In an environment in which a plurality of computing devices are organically connected using a communication network, an embodiment of the present invention may be implemented as being distributed to the plurality of computing devices.

도 3 및 도 4를 참조하면, 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템(100)은 다수의 노드(node)가 네트워크를 이루어 커뮤니티를 형성하는 네트워크 환경에서의 가시(可視) 노드 및 가시 노드의 엣지(edge)를 입력받는 입력부(110)와, 가시 노드 및 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 복구부(120)와, 기 설정된 로직을 이용하여 누락된 노드 중에서 주요 노드를 선별하는 선별부(140)와, 가시 노드 및 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정하는 추정부(160)를 포함한다.3 and 4, the community detection system 100 of a partially observed network according to an embodiment of the present invention is visible in a network environment in which a plurality of nodes form a network to form a community. ) An input unit 110 that receives edges of the node and the visible node, a recovery unit 120 that recovers the missing node using the edges of the visible node and the visible node, and a missing using the preset logic. A selector 140 selects a main node from among nodes, and an estimator 160 estimating a community in a network environment using the visible node and the main node.

네트워크 환경에는 사용자와 사용자 간에 친구 관계를 맺어 글, 사진, 동영상 등의 정보를 공유하는 소셜 네트워크, 신경회로망과 같은 생물학 네트워크 등이 포함될 수 있다.The network environment may include a social network, a biological network such as a neural network, and the like that form a friend relationship between a user and share information such as a text, a photo, and a video.

이 실시예는 예로써 네트워크 환경이 소셜 네트워크인 것으로 가정하고 설명한다.This embodiment assumes that the network environment is a social network by way of example.

소셜 네트워크에서의 노드는 사용자가 되고, 엣지는 상기 사용자가 다른 사용자와 맺은 친구 관계 상태 또는 상기 친구 관계가 맺어진 개수가 된다. 즉, 엣지는 노드와 노드를 연결하는 링크인 것으로 이해될 수 있다. 노드는 0개 내지 다수 개의 엣지를 가질 수 있다.A node in a social network becomes a user, and the edge is a friend relationship state with the user or the number of the friend relationships. In other words, it can be understood that the edge is a link connecting the node with the node. Nodes can have zero to multiple edges.

노드에는 적어도 가시 노드와 누락된 노드가 포함된다. 이 실시예는 입력부(110)가 네트워크 환경에 대한 데이터 수집 시 수집되어지는 노드를 가시 노드라 하고, 실제 네트워크 환경 상에서는 존재하나 입력부(110)를 통해 수집되지 않는 노드를 누락된 노드로 이해한다.Nodes include at least visible nodes and missing nodes. In this embodiment, the node that is collected when the input unit 110 collects data about the network environment is called a visible node, and the node that exists on the actual network environment but is not collected through the input unit 110 is understood as a missing node.

복구부(120)는 누락된 노드를 복구하기 위한 과정으로서, 가시 노드 및 누락된 노드의 수에 대응되는 행렬을 생성한다. 즉, 행렬은 행과 열의 수가 가시 노드 및 누락된 노드의 수에 대응한다.The recovery unit 120 is a process for recovering missing nodes and generates a matrix corresponding to the number of visible nodes and missing nodes. That is, the matrix corresponds to the number of rows and columns and the number of visible and missing nodes.

도 5를 참조하면, 복구부(120)는 크로네커(kronecker) 모델 및 기대-최대화 알고리즘(expectation-maximization algorithm)을 이용하여 행렬을 생성한다.Referring to FIG. 5, the recovery unit 120 generates a matrix using a kronecker model and an expectation-maximization algorithm.

구체적으로, 복구부(120)는 가시 노드 및 가시 노드의 엣지만을 이용하여 가시 노드들의 연결 관계가 반영된 가시 행렬을 생성한다. 가시 행렬은 행과 열의 수가 가시 노드의 수와 동일한 정방행렬이다. 행렬을 구성하는 성분은 엣지를 의미한다. 예를 들어, 소셜 네트워크에서 제2사용자와 제3사용자가 친구관계를 맺고 있다면, [2, 3] 위치의 성분 값은 1이 된다. 친구관계가 맺어지지 않은 사용자 간에는 성분이 0이 된다.In detail, the recovery unit 120 generates a visible matrix reflecting the connection relationship between the visible nodes using only the edges of the visible node and the visible node. The visible matrix is a square matrix where the number of rows and columns is equal to the number of visible nodes. The components constituting the matrix mean edges. For example, if a second user and a third user have a friend relationship in a social network, the component value of the position [2, 3] becomes 1. The component becomes 0 between users who do not have a friendship.

또한, 복구부(120)는 상기 가시 행렬과 기대-최대화 알고리즘을 이용하여 생성 파라미터 행렬(Θ) 및 노드 순열(σ)의 값을 획득한다. 획득된 생성 파라미터 행렬은 2*2, 3*3, 5*5와 같은 정방행렬 형태를 갖는다. 또한, 생성 파라미터 행렬에 포함된 성분은 각각 확률을 나타내기 때문에 0 내지 1 사이의 값을 가진다.In addition, the recovery unit 120 obtains values of the generation parameter matrix Θ and the node permutation σ using the visible matrix and the expected-maximization algorithm. The obtained generation parameter matrix has a square matrix form such as 2 * 2, 3 * 3, 5 * 5. In addition, the components included in the generation parameter matrix each have a value between 0 and 1 because they represent a probability.

또한, 복구부(120)는 크로네커 곱을 실시, 즉 지수 K 만큼 생성 파라미터 행렬(Θ)을 복수 회 제곱하는 것으로 확률적 인접 행렬을 생성한다. 생성 파라미터 행렬은 정방행렬이고, 확률적 인접 행렬은 지수 K 만큼 제곱되어 구해짐에 따라, 확률적 인접 행렬도 정방행렬이 된다.In addition, the recovery unit 120 generates a probabilistic neighbor matrix by performing a Kronecker product, that is, by multiplying the generation parameter matrix Θ by the index K multiple times. The generation parameter matrix is a square matrix, and as the stochastic neighbor matrix is squared by the exponent K, the stochastic neighbor matrix also becomes a square matrix.

확률적 인접 행렬의 행과 열의 수는 가시 노드 및 누락된 노드가 포함된 모든 노드의 개수 이상인 것이 바람직하다. 예를 들어, 모든 노드의 개수가 100개인 경우, 지수 K는 7 이상이 되어 생성된 확률적 인접 행렬의 행과 열의 수가 모든 노드의 수보다 많은 128개가 될 수 있다. 모든 노드의 수를 초과하는 행과 열의 범위는 이 단계 또는 이후의 단계에서 제거된다. 확률적 인접 행렬은 모든 노드들의 네트워크 연결 관계를 나타내기 위한 적합한 스케일의 행렬을 제공하는 테이블의 기능을 한다. 도 5의 실시예는 확률적 인접 행렬의 생성을 위해 지수 K가 3으로 설정 되었고, 생성 파라미터 행렬(Θ)을 3제곱 한 결과 8*8의 확률적 인접 행렬이 생성되었다.Preferably, the number of rows and columns of the probabilistic contiguous matrix is greater than or equal to the number of all nodes that contain visible and missing nodes. For example, if the number of all nodes is 100, the index K may be 7 or more, so that the number of rows and columns of the probabilistic adjacent matrix generated may be 128, which is greater than the number of all nodes. The range of rows and columns beyond the number of all nodes is removed at this or a later stage. The probabilistic adjacency matrix functions as a table that provides a matrix of appropriate scale to represent the network connectivity of all nodes. In the example of FIG. 5, the exponent K is set to 3 to generate a probabilistic neighbor matrix, and a 3x squared generation parameter matrix Θ produces a 8 * 8 probabilistic neighbor matrix.

또한, 복구부(120)는 확률적 인접 행렬에 가시 행렬의 정보를 반영한다. 도 5의 실시예에서는 가시 행렬이 6*6의 스케일을 가지는 것으로 실시되었다. 복구부(120)는 8*8 스케일의 확률적 인접 행렬 중 6*6에 해당되는 영역을 가시 행렬의 값으로 치환하였다.In addition, the recovery unit 120 reflects the information of the visible matrix in the probabilistic neighbor matrix. In the example of FIG. 5, the visible matrix is implemented with a scale of 6 * 6. The recovery unit 120 replaces an area corresponding to 6 * 6 of the probabilistic neighboring matrix of 8 * 8 scale with the value of the visible matrix.

또한, 복구부(120)는 확률적 인접 행렬 중 누락된 노드에 해당되는 영역(가시 행렬로 치환되지 않은 영역)에서 베르누이 시행(Bernoulli trial)을 실시하는 것으로 성공 사건 및 실패 사건을 도출한다. 베르누이 시행이 실시되기 전 누락된 노드의 범위에 해당되는 성분들은 각각 0 내지 1 사이의 값을 가진다. 베르누이 시행이 실시되는 것은 누락된 노드들의 연결 관계(엣지)를 추론하기 위한 것이다. 베르누이 시행은 를 이용하는 것으로, 성공 사건 및 실패 사건을 도출한다. 이때, u는 행을 의미하고, v는 열을 의미한다. 베르누이 시행은 누락된 노드의 범위에 해당되는 성분을 대상으로 실시된다.In addition, the recovery unit 120 derives a success event and a failure event by performing a Bernoulli trial in a region corresponding to a missing node of a probabilistic adjacent matrix (an area not substituted with a visible matrix). Before the Bernoulli trial is implemented, the components that fall into the range of missing nodes have values between 0 and 1, respectively. The Bernoulli trial is carried out to infer the connections (edges) of the missing nodes. Bernoulli enforcement By using this, a success event and a failure event are derived. In this case, u means a row and v means a column. Bernoulli trials are carried out on components falling within the range of the missing node.

베르누이 시행의 실시 결과, 성공 사건으로 도출된 성분은 1로 치환되고, 실패 사건으로 도출된 성분은 0으로 치환된다.As a result of the Bernoulli trial, the component derived from the success event is replaced with 1, and the component derived from the failure event is replaced with 0.

이로써, 최종적으로 생성된 행렬은 행과 열의 스케일이 모든 노드의 수에 대응하고, 모든 성분은 0 또는 1의 값을 가지게 되는 복구 행렬이 된다. 복구 행렬은 가시 노드뿐만 아니라 누락된 노드의 연결 관계까지 포함하므로 누락된 노드의 엣지가 복구된 것이 된다.Thus, the resulting matrix is a recovery matrix where the scale of the rows and columns corresponds to the number of all nodes, and all components have a value of zero or one. The recovery matrix includes not only visible nodes but also missing nodes, so that the edges of the missing nodes are recovered.

도시된 바와 같이, 제7노드에 해당되는 7번째 행과 7번째 열의 성분은 서로 대칭되고, 제8노드에 해당되는 8번째 행과 8번째 열의 성분은 서로 대칭될 수 있다.As illustrated, components of the seventh row and the seventh column corresponding to the seventh node may be symmetrical with each other, and components of the eighth row and the eighth column corresponding to the eighth node may be symmetrical with each other.

복구 행렬에 나타난 누락된 노드의 엣지는 실제 네트워크 환경에서의 엣지와 차이가 있다. 앞서 설명된 바와 같이, 복구 행렬의 누락된 노드는 가시 행렬로부터 생성 파라미터 행렬이 생성되고, 생성 파라미터 행렬을 복수 회 제곱하는 것으로 생성된 확률적 인접 행렬에서 베르누이 시행을 실시하는 것으로 도출된다. 즉, 복구 행렬에 나타난 누락된 노드의 엣지는 통계학적인 추론을 이용하여 도출되는 것이므로 오류가 존재할 가능성이 있다.The edge of the missing node in the recovery matrix is different from the edge in the real network environment. As described above, the missing node of the recovery matrix is derived by performing a Bernoulli trial on the probabilistic neighbor matrix generated by generating a generation parameter matrix from the visible matrix and multiplying the generation parameter matrix. That is, there is a possibility that an error exists because the edge of the missing node shown in the recovery matrix is derived using statistical inference.

도 6을 참조하면, 선별부(140)는 기 설정된 로직을 이용하여 누락된 노드 중에서 주요 노드를 선별한다. 상기 누락된 노드는 복구부(120)가 복구하였던 누락된 노드이다.Referring to FIG. 6, the selector 140 selects major nodes from missing nodes by using preset logic. The missing node is the missing node that the recovery unit 120 has recovered.

기 설정된 로직은 적어도 두 가지의 방법이 될 수 있다.The preset logic may be at least two ways.

제1실시예에 따른 기 설정된 로직은 복구 행렬 중 각 누락된 노드에 해당되는 행 또는 열의 성분들의 값을 합산한다. 각각의 누락된 노드에 대한 행 또는 열의 성분들의 값이 합산 완료되면, 합산된 값이 기 설정된 기준 값 이상인 누락된 노드를 주요 노드로 선별한다. 예를 들어, 도 6의 실시예는 기 설정된 기준 값(

)이 2로 정의되고, 복구 행렬 중 누락된 노드에 해당되는 7번째 행과 8번째 행에 대하여 성분들 값의 합산이 실시되었다. 7번째 행은 [1, 7]과 [3, 7]의 성분이 1이므로 합산 결과는 2가 된다. 8번째 행은 [1, 8]의 성분만 1이므로 합산 결과는 1이 된다. 따라서, 7번째 행만 기 설정된 기준 값의 이상에 해당되므로, 7번째 행에 대응되는 누락된 노드(제7노드)만 주요 노드로 선택된다.The preset logic according to the first embodiment sums the values of the components of the row or column corresponding to each missing node in the recovery matrix. When the values of the components of the row or column for each missing node are added together, the missing node whose summed value is greater than or equal to the predetermined reference value is selected as the main node. For example, the embodiment of FIG. 6 may have a preset reference value (

) Is defined as 2, and the component values are summed over the 7th and 8th rows corresponding to the missing nodes in the recovery matrix. In the seventh row, since the components of [1, 7] and [3, 7] are 1, the sum result is 2. In the eighth row, only the components of [1, 8] are 1, so the sum result is 1. Therefore, since only the seventh row corresponds to a predetermined reference value or more, only the missing node (seventh node) corresponding to the seventh row is selected as the main node.

제2실시예에 따른 기 설정된 로직은 복구 행렬 중 누락된 노드에 해당되는 영역을 대상으로 중심성(centrality)을 분석한 후, 기 설정된 범위 내에 포함되는 누락된 노드를 주요 노드로 선별한다.The preset logic according to the second embodiment analyzes the centrality of the region corresponding to the missing node in the recovery matrix, and selects the missing node included in the preset range as the main node.

선별부(140)에서 주요 노드가 선택되면, 추정부(160)가 가시 노드 및 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정한다. 커뮤니티의 추정은 비-음영 행렬 인수분해(NMF, Non-negative Matrix Factorization)가 이용될 수 있다.When the main node is selected in the selector 140, the estimator 160 estimates the community in the network environment using the visible node and the main node. Community estimation may use non-negative matrix factorization (NMF).

도 7을 참조하면, 커뮤니티와 노드 사이의 관계는 행렬 F로 표기된다. u번째 노드에 커뮤니티 c에 속할 가능성은

의 가중치를 갖는다.Referring to FIG. 7, the relationship between the community and the node is represented by the matrix F. The probability that it belongs to community c in node u

Has a weight of.

예를 들어, 행렬 A가 100*100의 스케일을 가지고, 커뮤니티가 5개가 존재하는 경우, 100*100 행렬은 100*5행렬과 5*100행렬의 곱으로 구성된다. 이 행렬을 F라 할 때, F*F의 전이 행렬을 연산하면 A가 도출된다.For example, if matrix A has a scale of 100 * 100 and there are 5 communities, the 100 * 100 matrix consists of the product of 100 * 5 matrix and 5 * 100 matrix. When this matrix is called F, A can be derived by calculating the F * F transition matrix.

이 실시예에서, 행렬 A는 추론으로 도출된 상태이다. 행렬 A가 정확하지 않더라도 가시 노드 및 주요 노드로 구성된 80*80 행렬을 가지고 있다면, 상기 행렬은 (80*5)*(5*80)으로 구성된다는 것을 알고 있고, 80*5 행렬과 5*80 행렬은 대칭된다는 것을 알고 있으므로, 80*5 행렬만 추론을 실시한다.In this embodiment, matrix A is inferred. If matrix A has an 80 * 80 matrix of visible and principal nodes even though it is not accurate, we know that the matrix consists of (80 * 5) * (5 * 80), and we know that 80 * 5 and 5 * 80 Since we know that matrices are symmetric, we only infer 80 * 5 matrices.

80*5 행렬 찾아내는 것은 비-음영 행렬 인수분해가 이용된다. 비-음영 행렬 인수분해 기법을 이용하면, 5개의 커뮤니티 중 노드가 어느 커뮤니티에 소속되어있고, 중복되어 소속된 여부 등이 노드의 정보에 부여된다. 또한, 커뮤니티에 소속되지 않은 노드도 발견된다.Finding an 80 * 5 matrix uses non-shading matrix factorization. Using a non-shading matrix factorization technique, a node of five communities belongs to which community, and whether the node belongs to a duplicate, etc., is given to the information of the node. Also, nodes that do not belong to the community are found.

위에서 언급된 바와 같이, 누락된 노드의 엣지는 통계학적인 추론을 이용하여 도출되는 것이므로 오류가 존재할 가능성이 있다. 오류가 존재하는 누락된 노드를 커뮤니티 추정 시 모두 활용하면 오류가 누적되어 더 큰 오류를 발생시킨다. 또한, 커뮤니티 추정 시 누락된 노드를 완전히 배제하거나 소수의 누락된 노드만 이용하면 커뮤니티를 검출하는데 필요한 정보가 충분하지 않기 때문에 올바른 커뮤니티 구조를 추정할 수 없게 된다.As mentioned above, the edge of the missing node is derived using statistical inference, so there is a possibility of error. If all missing nodes with errors are used for community estimation, errors will accumulate, resulting in larger errors. In addition, when the community is completely excluded or only a few missing nodes are used when estimating the community, the correct community structure cannot be estimated because there is not enough information necessary to detect the community.

본 발명의 실시예에서 선별된 주요 노드는 다수의 엣지를 가지고 있으므로, 적은 수의 엣지를 가지는 누락된 노드보다 오류가 적을 것으로 기대할 수 있다. 또한, 일반적으로 다수의 엣지를 가지는 노드는 커뮤니티를 대표하는 기능을 가지게 되므로, 적은 수의 엣지를 가지는 누락된 노드를 모두 포함시켜 커뮤니티를 추정하는 것보다 높은 신뢰성을 가지게 된다. 예를 들어, 어느 한 전공학과를 하나의 커뮤니티라 가정할 때, 커뮤니티를 구성하는 것은 해당 전공학과에 소속된 모든 학생(노드)이지만, 커뮤니티 내 상당수의 학생들과 교류(엣지)하게 되는 학과대표만을 찾아 관찰하더라도 충분히 해당 커뮤니티의 성격을 추정하는 것이 가능하다. 이때, 적은 수의 엣지를 가지는 누락된 노드는 학과대표와 개인적인 친분이 있는 타 전공학과의 학생이 될 수 있으므로 오히려 커뮤니티 추정 시 유효한 노드로 포함시키는 것은 커뮤니티 추정 결과에 오류를 유발할 수 있다.In the embodiment of the present invention, since the selected main node has a large number of edges, it can be expected that there will be fewer errors than missing nodes having a small number of edges. In addition, since a node having a plurality of edges generally has a function of representing a community, the node has a higher reliability than estimating the community by including all the missing nodes having a small number of edges. For example, assuming that one major is a community, it is only the department representatives who make up the community but all students (nodes) belonging to the major, but who interact with a large number of students in the community. Even by finding and observing, it is possible to sufficiently estimate the character of the community. In this case, the missing node having a small number of edges may be a student of another engineering department who has a personal acquaintance with the department representative. Rather, including a valid node in the community estimation may cause an error in the community estimation result.

따라서, 본 발명의 일 실시예에 따른 가시 노드 및 주요 노드만으로 커뮤니티를 추정하는 방법은 종래에 가시 노드만을 이용하는 커뮤니티 추정 방법이나, 가시 노드 및 모든 누락된 노드를 이용하는 커뮤니티 추정 방법보다 높은 신뢰성을 가지는 커뮤니티 검출이 가능하게 된다.Therefore, the method of estimating a community using only the visible nodes and the main nodes according to an embodiment of the present invention has a higher reliability than the community estimating method using only the visible nodes or the community estimating method using the visible nodes and all missing nodes. Community detection is possible.

이어서, 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 방법을 설명한다.Next, a community detection method of a partially observed network according to an embodiment of the present invention will be described.

도 4 및 도 8을 참조하면, 본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 방법은 입력부(110)가 다수의 노드(node)가 네트워크를 이루어 커뮤니티를 형성하는 네트워크 환경에서의 가시 노드 및 상기 가시 노드의 엣지(edge)를 입력받는 단계(S110)와, 복구부(120)가 가시 노드 및 상기 가시 노드의 엣지를 이용하여 누락된 노드를 복구하는 단계(S120)와, 선별부(140)가 기 설정된 로직을 이용하여 누락된 노드 중에서 주요 노드를 선별하는 단계(S140)와, 추정부(160)가 가시 노드 및 주요 노드를 이용하여 네트워크 환경에서 커뮤니티를 추정하는 단계(S160)를 포함하는 것을 특징으로 한다.4 and 8, according to an embodiment of the present invention, a method for detecting a community of a partially observed network is visible in a network environment in which the input unit 110 forms a community by forming a plurality of nodes in a network. Receiving an edge of the node and the visible node (S110), recovering the missing node using the visible node and the edge of the visible node (S120), and the selecting unit (140) selecting the main node from the missing node using the predetermined logic (S140), and the estimator 160 using the visible node and the main node to estimate the community in the network environment (S160) Characterized in that it comprises a.

S120 단계는 가시 노드 및 누락된 노드의 수에 대응되는 행렬이 생성된다. 상기 행렬은 복구부(120)가 크로네커(kronecker) 모델 및 기대-최대화 알고리즘(expectation-maximization algorithm)을 이용하여 생성하는 것을 특징으로 한다.In step S120, a matrix corresponding to the number of visible nodes and missing nodes is generated. The matrix is generated by the recovery unit 120 using a kronecker model and an expectation-maximization algorithm.

도 6 및 도 9를 참조하면, S120 단계는 가시 노드 및 상기 가시 노드의 엣지를 이용하여 가시 노드들의 연결 관계가 반영된 가시 행렬을 생성하는 단계(S121)와, 가시 행렬과 기대-최대화 알고리즘을 이용하여 생성 파라미터 행렬 및 노드 순열 값을 획득하는 단계(S122)와, 생성 파라미터 행렬을 복수 회 제곱하는 것으로 확률적 인접 행렬을 생성하는 단계(S124)와, 확률적 인접 행렬에 가시 행렬의 정보를 반영하는 단계(S126)와, 확률적 인접 행렬 중 누락된 노드에 해당되는 영역에서 베르누이 시행을 실시하는 것으로 성공 사건 및 실패 사건을 도출하는 단계(S127)와, 성공 사건에 해당되는 성분에는 1을 반영하고, 실패 사건에 해당되는 성분에는 0을 반영하는 것으로 복구 행렬을 생성하는 단계(S128)를 포함한다.6 and 9, in step S120, a visible matrix reflecting a connection relationship between visible nodes using a visible node and an edge of the visible node is generated (S121), and a visible matrix and an expectation-maximization algorithm are used. Acquiring a generation parameter matrix and a node permutation value (S122); generating a probabilistic neighbor matrix by multiplying the generation parameter matrix (S124); and reflecting information of the visible matrix on the probabilistic neighbor matrix. Step (S126), performing Bernoulli in a region corresponding to the missing node of the probabilistic neighbor matrix, deriving a success event and a failure event (S127), and a component corresponding to the success event is reflected to 1 In operation S128, a recovery matrix is generated by reflecting 0 in the component corresponding to the failure event.

S140 단계의 기 설정된 로직은 적어도 두 가지의 방법이 될 수 있다.The preset logic of step S140 may be at least two methods.

제1실시예에 따른 기 설정된 로직은 복구 행렬 중 누락된 노드에 해당되는 행 또는 열의 성분들을 합산한 후, 합산된 값이 기 설정된 기준 값 이상인 누락된 노드를 주요 노드로 선별한다.The preset logic according to the first embodiment sums up the components of the row or column corresponding to the missing node in the recovery matrix, and then selects the missing node as the main node whose summed value is greater than or equal to the preset reference value.

S160 단계는 비-음영 행렬 인수분해(NMF, Non-negative Matrix Factorization)를 이용하여 커뮤니티를 추정한다.Step S160 estimates the community using non-negative matrix factorization (NMF).

[시뮬레이션][simulation]

본 발명의 일 실시예에 따른 부분 관찰되는 네트워크의 커뮤니티 검출 시스템(100) 및 방법의 커뮤니티 검출 성능을 측정하기 위해 지상 진실(ground-truth) 커뮤니티가 레이블링된 데이터 세트를 대상으로 실험을 실시하였다. 아울러, 종래기술로도 동일한 조건에서 커뮤니티 검출을 실시하였다.In order to measure the community detection performance of the community detection system 100 and method of the partially observed network according to an embodiment of the present invention, an experiment was conducted on a data set labeled with a ground-truth community. In addition, even in the prior art, community detection was performed under the same conditions.

종래기술1은 가시 노드만을 이용하여 커뮤니티를 검출하는 방법이고, 종래기술2는 누락된 노드를 모두 이용하여 커뮤니티를 검출하는 방법이다.Prior art 1 is a method for detecting communities using only visible nodes, and prior art 2 is a method for detecting communities using all missing nodes.

커뮤니티 검출 성능의 측정을 위해 널리 이용되는 측정방식인 표준화된 상호 정보(NMI, Normalized Mutual Information)를 이용하였다.Normalized Mutual Information (NMI), which is a widely used measurement method, was used to measure community detection performance.

도 10을 참조하면, 다섯 종류의 합성 데이터 세트를 대상으로 본 발명의 일 실시예와 종래기술1 및 종래기술2를 이용하여 커뮤니티 검출을 실시한 결과, 본 발명의 일 실시예가 현저히 높은 커뮤니티 검출 성능을 나타내었다.Referring to FIG. 10, as a result of community detection using an embodiment of the present invention and the prior art 1 and the prior art 2 for five kinds of synthetic data sets, an embodiment of the present invention exhibits significantly higher community detection performance. Indicated.

또한, 도 11을 참조하면, 아마존(Amazon)과 DBLP의 실제 데이터 세트를 대상으로 본 발명의 일 실시예와 종래기술1 및 종래기술2를 이용하여 커뮤니티 검출을 실시한 결과에서도 본 발명의 일 실시예가 현저히 높은 커뮤니티 검출 성능을 나타내는 것으로 확인되었다.In addition, referring to FIG. 11, an embodiment of the present invention is shown in a result of community detection using an embodiment of the present invention and the prior art 1 and the prior art 2 for actual data sets of Amazon and DBLP. It was found to exhibit significantly higher community detection performance.

이상에서 본 발명의 바람직한 실시예를 설명하였으나, 본 발명은 다양한 변화와 변경 및 균등물을 사용할 수 있다. 본 발명은 상기 실시예를 적절히 변형하여 동일하게 응용할 수 있음이 명확하다. 따라서 상기 기재 내용은 아래 특허청구범위의 한계에 의해 정해지는 본 발명의 범위를 한정하는 것이 아니다.Although the preferred embodiment of the present invention has been described above, the present invention may use various changes, modifications, and equivalents. It is clear that the present invention can be applied in the same manner by appropriately modifying the above embodiments. Therefore, the above description does not limit the scope of the present invention as defined by the limitations of the following claims.

100 : 커뮤니티 검출 시스템 110 : 입력부
120 : 복구부 140 : 선별부
160 : 추정부100: community detection system 110: input unit
120: recovery unit 140: selection unit
160: estimator

Claims

An input unit configured to receive a visible node and an edge of the visible node in a network environment in which a plurality of nodes form a network to form a community;
A recovery unit for recovering a missing node using the visible node and the edge of the visible node;
A selector which selects a main node from the missing nodes by using preset logic;
And an estimator for estimating a community in a network environment using the visible node and the main node.

The method of claim 1,
And the recovery unit generates a matrix corresponding to the number of visible nodes and the missing nodes.

The method of claim 2, wherein the preset logic is
And summing the components of the row or column corresponding to the missing node in the matrix, and then selecting the missing node whose summed value is greater than or equal to a predetermined reference value as the main node. .

The method of claim 2, wherein the preset logic is
After analyzing the centrality of the area corresponding to the missing node in the matrix, community detection of the partially observed network characterized in that the missing node included in the predetermined range is selected as the main node system.

The method according to any one of claims 2 to 4,
And the recovery unit generates the matrix using a kronecker model and an expectation-maximization algorithm.

The method of claim 5,
The recovery unit generates a visible matrix reflecting a connection relationship between the visible nodes using only the edges of the visible node and the visible node,
Obtain a generation parameter matrix and node permutation values using the visible matrix and the expected-maximization algorithm,
After generating a probabilistic neighbor matrix by multiplying the generation parameter matrix,
Reflects information of the visible matrix in the probabilistic neighbor matrix,
Deriving a success event and a failure event by performing Bernoulli enforcement in the region corresponding to the missing node of the probabilistic neighbor matrix,
And a recovery matrix is generated by reflecting 1 in a component corresponding to the success event and 0 in a component corresponding to the failure event.

The method of claim 1,
And the estimator estimates the community using non-negative matrix factorization (NMF).

Receiving, by an input unit, a visible node and an edge of the visible node in a network environment in which a plurality of nodes form a network to form a community;
A recovery unit recovering the missing node using the visible node and the edge of the visible node;
Selecting a main node from the missing nodes by using a preset logic;
And an estimator estimating a community in a network environment by using the visible node and the main node.

The method of claim 8, wherein the recovery unit recovers the missing node using the visible node and the edge of the visible node,
And a matrix corresponding to the number of visible nodes and the missing nodes is generated.

The method of claim 9, wherein the predetermined logic is
Summing the components of the row or column corresponding to the missing node in the matrix, and then selecting the missing node whose summed value is greater than or equal to a predetermined reference value as the main node. .

The method of claim 9, wherein the predetermined logic is
After analyzing the centrality of the area corresponding to the missing node in the matrix, community detection of the partially observed network characterized in that the missing node included in the predetermined range is selected as the main node Way.

The method according to any one of claims 9 to 11,
And the matrix is generated by the recovery unit using a kronecker model and an expectation-maximization algorithm.

The method of claim 12, wherein the recovery unit recovers the missing node by using the visible node and the edge of the visible node.
Generating a visible matrix reflecting a connection relationship between the visible nodes using the edges of the visible node and the visible node; Obtaining a generation parameter matrix and node permutation values using the visible matrix and the expectation-maximization algorithm; Generating a probabilistic neighbor matrix by squaring the generation parameter matrix a plurality of times; Reflecting information of the visible matrix in the probabilistic neighbor matrix; Deriving a success event and a failure event by performing a Bernoulli enforcement in a region corresponding to the missing node of the probabilistic neighbor matrix; And generating a recovery matrix by reflecting 1 to a component corresponding to the success event and 0 to a component corresponding to the failure event.

The method of claim 8, wherein the estimating unit estimates a community in a network environment using the visible node and the main node.
A method for community detection of a partially observed network, characterized by estimating a community using non-negative matrix factorization (NMF).