KR20190016249A

KR20190016249A - Method to derive cluster structure from similarity-based network

Info

Publication number: KR20190016249A
Application number: KR1020170100223A
Authority: KR
Inventors: 박주용; 전규현; 이동만
Original assignee: 한국과학기술원
Priority date: 2017-08-08
Filing date: 2017-08-08
Publication date: 2019-02-18
Also published as: KR102106670B1

Abstract

According to one embodiment of the present invention, a method for deriving a cluster structure may comprise the steps of: modeling nodes into a full network, wherein the full network means that all nodes are completely connected to each other; measuring a degree of similarity between nodes included in the full network; and searching for a specific cluster in the full network based on the measured similarity. The method may perform more accurate similarity-based clustering.

Description

[0001] METHOD TO DERIVE CLUSTER STRUCTURE FROM SIMILARITY-BASED NETWORK [0002]

아래의 설명은 모든 노드들이 서로 완전하게 연결된 양의 유사도 가중치 기반 완전 네트워크에서 군집을 탐색하는 기술에 관한 것이다.
The following description relates to a technique for all nodes to search for clusters in a fully similar weighted-based full network of mutually connected nodes.

일반적인 심사위원-참가자 경쟁 형식에서, 심사위원은 여러 명으로 구성되어 참가자의 순위를 결정하기 위하여 그들의 퍼포먼스를 평가한다. 축구 또는 농구와 같이 두 명의 참가자가 승자를 배출하기 위해 일대일 매치를 벌이는 또 다른 일반적인 경쟁 형식과 달리, 심사 위원의 객관성은 종종 경쟁의 공정성에 대한 대중의 신뢰를 훼손할 수 있다. 배심원-참가자 경쟁 형식을 가중치 기반 양자 간 네트워크로 모델링하여 편향된 점수를 어떻게 식별하고 편향된 점수가 경쟁 및 구조에 어떻게 영향을 미치는지를 도출할 수 있다. 권위 있는 국제 쇼팽 피아노 콩쿨 2015년 대회에서의 득점 논란을 일 예로 분석하여, 편향된 엣지(edges)가 극히 일부분만 존재하더라도 네트워크 구조에 대한 우리의 추론을 심각하게 왜곡할 수 있음을 보여준다. 이에 따라 네트워크 추론에서 편향된 탐지 및 제거의 중요성이 부각되고 있다. General Judge-Participant In the competition format, the judges are composed of several people and evaluate their performances to determine the ranking of the participants. Unlike other common forms of competition where two participants, such as football or basketball, make one-on-one matches to winners, judges' objectivity can often undermine public confidence in the fairness of competition. By modeling the jury-participant competition format as a weight-based bilateral network, we can identify how biased scores are scored and how biased scores affect competition and structure. The authoritative International Chopin Piano Competition Analysis of the scoring controversy at the 2015 competition shows that even if only a fraction of the biased edges exist, we can seriously distort our reasoning about the network structure. Therefore, the importance of biased detection and elimination in network reasoning is emphasized.

한편, 한국공개특허 제10-2016-0064448호는 유사 집합의 예상 선호도 대비 기반의 아이템 추천 제공 방법에 관한 것으로, 모든 사용자가 아이템에 부여하는 실제 선호도로부터 전체 완성된 예측 선호도를 구하고, 특정 사용자에게 특정 아이템을 추천할 때 가장 유사하면서 아이템의 예상 선호도 보다 실제 선호도가 낮은 아이템 중에서 가장 높은 실제 선호도를 추천하는, 유사 집합의 예상 선호도 대비 기반 아이템 추천 제공 방법을 제공하는 것을 제안하고 있다.
Korean Patent Laid-Open No. 10-2016-0064448 discloses a method for providing recommendation of an item based on an anticipated preference of a similar set. The method includes the steps of obtaining an overall completed prediction preference from an actual preference assigned to an item by all users, It is proposed to provide a method of recommending an item based on the anticipated similarity of the similar set, which recommends the highest real preference among the items which are most similar but which have lower actual preference than the anticipated preference of the item.

본 발명은 모든 노드들이 서로 완전하게 연결된 양의 유사도 가중치 기반 완전 네트워크에서 군집을 탐색하는 기법을 제공함으로써 기존의 군집 탐색 기법이 유사도 가중치 기반 네트워크에서 유의미한 네트워크 군집을 찾아낼 수 없었던 한계점 극복하고자 한다.
The present invention attempts to overcome the limitations of the conventional clustering search technique that could not find meaningful network clusters in the similarity weight based network by providing a technique of searching clusters in a total similarity weight based perfect network in which all nodes are completely connected to each other.

군집 구조를 도출하는 방법은, 노드들을 양자간 완전 네트워크로 모델링하는 단계-상기 완전 네트워크는 모든 노드들이 서로 완전하게 연결된 상태를 의미함-; 상기 완전 네트워크에 포함된 노드들 간의 유사도를 측정하는 단계; 및 상기 측정된 유사도에 기반하여 상기 완전 네트워크에서 특정 군집을 탐색하는 단계를 포함할 수 있다. A method for deriving a cluster structure includes modeling nodes as a complete bilateral network, wherein the complete network means that all nodes are completely connected to each other; Measuring a degree of similarity between nodes included in the complete network; And searching for a specific cluster in the full network based on the measured similarity.

상기 노드들을 양자간 완전 네트워크로 모델링하는 단계는, 복수 개의 노드(l, l은 자연수)와 복수 개의 다른 노드(r, r은 자연수)에 대하여 점수를 나타내는 가중치가 적용된 엣지를 양자간 네트워크로 모형화하고, 상기 양자간 완전 네트워크가 l*r의 양자간 인접 행렬의 형태로 주어지는 단계를 포함할 수 있다. Modeling the nodes into a full inter-bilayer network may include modeling an edge to which a weight indicating a score is applied to a plurality of nodes (l and l are natural numbers) and a plurality of other nodes (r and r are natural numbers) , And the inter-quantum full network is given in the form of a l * r inter-quantum adjacency matrix.

상기 노드들을 양자간 완전 네트워크로 모델링하는 단계는, 상기 완전 네트워크의 노드 클래스로의 단일-모드 프로젝션에 가중치가 적용되고, 상기 가중치가 적용된 엣지를 통하여 노드간에 쌍 연관 강도의 유형을 나타내는 단계를 포함할 수 있다. Modeling the nodes into a full bilateral network includes weighting a single-mode projection to a node class of the full network and indicating the type of pair association strength between nodes through the weighted edge can do.

상기 노드들을 양자간 완전 네트워크로 모델링하는 단계는, 상기 복수 개의 노드(l, l은 자연수)에 기초한 코사인 유사도를 상기 복수 개의 다른 노드에 적용함에 따라 단일 모드 프로젝션 네트워크의 상기 복수 개의 노드의 개수* 상기 복수 개의 노드의 개수의 유사도 가중치 인접 행렬을 정의하는 단계를 포함할 수 있다. Modeling the nodes as a full inter-bilayer network may include: calculating a number of the plurality of nodes * of the single mode projection network by applying a cosine similarity based on the plurality of nodes (l, l is a natural number) And defining a similarity weighted adjacency matrix of the number of the plurality of nodes.

상기 완전 네트워크에 포함된 노드들 간의 유사도를 측정하는 단계는, 전체의 노드에 기초하여 복수 개의 노드간의 평균 유사도를 계산하는 단계를 포함할 수 있다. Measuring the similarity between the nodes included in the complete network may include calculating an average similarity between the plurality of nodes based on the entire node.

상기 측정된 유사도에 기반하여 상기 완전 네트워크에서 특정 군집을 탐색하는 단계는, 상기 노드들 간의 쌍 유사도 또는 관련성에 기반하여 노드의 군집을 식별함으로써 노드 분류를 수행하는 계층적 군집화를 수행하는 단계를 포함할 수 있다. Wherein the step of searching for a specific cluster in the complete network based on the measured similarity includes performing hierarchical clustering for performing node classification by identifying a cluster of nodes based on pair similarity or relevance between the nodes can do.

상기 측정된 유사도에 기반하여 상기 완전 네트워크에서 특정 군집을 탐색하는 단계는, 상기 계층적 군집화가 상기 완전 네트워크의 모듈 구조를 결정하는데 사용되는 단계를 포함할 수 있다. The step of searching for a specific cluster in the full network based on the measured similarity may comprise the step of the hierarchical clustering being used to determine a module structure of the complete network.

군집 구조를 도출하는 방법을 실행시키기 위해 저장매체에 저장된 컴퓨터 프로그램은, 노드들을 양자간 완전 네트워크로 모델링하는 단계-상기 완전 네트워크는 모든 노드들이 서로 완전하게 연결된 상태를 의미함-; 상기 완전 네트워크에 포함된 노드들 간의 유사도를 측정하는 단계; 및 상기 측정된 유사도에 기반하여 상기 완전 네트워크에서 특정 군집을 탐색하는 단계를 포함할 수 있다. A computer program stored on a storage medium for executing a method for deriving a cluster structure, the method comprising: modeling nodes into a full two-way network, wherein the complete network means that all nodes are completely connected to each other; Measuring a degree of similarity between nodes included in the complete network; And searching for a specific cluster in the full network based on the measured similarity.

군집 구조를 도출하는 장치는, 노드들을 양자간 완전 네트워크로 모델링하는 모델링부-상기 완전 네트워크는 모든 노드들이 서로 완전하게 연결된 상태를 의미함-; 상기 완전 네트워크에 포함된 노드들 간의 유사도를 측정하는 측정부; 및 상기 측정된 유사도에 기반하여 상기 완전 네트워크에서 특정 군집을 탐색하는 탐색부를 포함할 수 있다. An apparatus for deriving a cluster structure includes: a modeling unit for modeling nodes as a complete bilateral network, wherein the complete network means that all nodes are completely connected to each other; A measurement unit for measuring a degree of similarity between nodes included in the complete network; And a search unit searching for a specific cluster in the complete network based on the measured similarity.

상기 모델링부는, 상기 노드들을 양자간 완전 네트워크로 모델링하는 단계는, 복수 개의 노드(l, l은 자연수)와 복수 개의 다른 노드(r, r은 자연수)에 대하여 점수를 나타내는 가중치가 적용된 엣지를 양자간 네트워크로 모형화하고, 상기 양자간 완전 네트워크가 l*r의 양자간 인접 행렬의 형태로 주어질 수 있다. The modeling unit modeling the nodes as a full inter-bilayer network may include a step of modeling an edge to which a weight indicating a score is applied to a plurality of nodes (l and l are natural numbers) and a plurality of other nodes (r and r are natural numbers) And the complete inter-quantum network can be given in the form of a l * r inter-quantum adjacency matrix.

상기 모델링부는, 상기 완전 네트워크의 노드 클래스로의 단일-모드 프로젝션에 가중치가 적용되고, 상기 가중치가 적용된 엣지를 통하여 노드간에 쌍 연관 강도의 유형을 나타낼 수 있다. The modeling unit may weight the single-mode projection to the node class of the full network, and may indicate the type of pairing strength between the nodes through the edge to which the weighting is applied.

상기 모델링부는, 상기 복수 개의 노드(l, l은 자연수)에 기초한 코사인 유사도를 상기 복수 개의 다른 노드에 적용함에 따라 단일 모드 프로젝션 네트워크의 상기 복수 개의 노드의 개수* 상기 복수 개의 노드의 개수의 유사도 가중치 인접 행렬을 정의할 수 있다. Wherein the modeling unit calculates the number of the plurality of nodes of the single mode projection network by applying a cosine similarity based on the plurality of nodes (l, l is a natural number) to the plurality of other nodes, Adjacent matrices can be defined.

상기 측정부는, 전체의 노드에 기초하여 복수 개의 노드간의 평균 유사도를 계산할 수 있다. The measurement unit may calculate an average degree of similarity between a plurality of nodes based on the entire nodes.

상기 탐색부는, 상기 노드들 간의 쌍 유사도 또는 관련성에 기반하여 노드의 군집을 식별함으로써 노드 분류를 수행하는 계층적 군집화를 수행할 수 있다. The searching unit may perform hierarchical clustering for performing node classification by identifying a cluster of nodes based on pair similarity or relevance between the nodes.

상기 탐색부는, 상기 계층적 군집화가 상기 완전 네트워크의 모듈 구조를 결정하는데 사용될 수 있다.
The searching unit may use the hierarchical clustering to determine a module structure of the complete network.

기존의 사용자 군집화 방법이 먼저 사용자들을 군집화하는 과정을 수행한 뒤 도출된 군집들 간의 유사도를 비교하거나 군집 내에서의 사용자 간의 유사도를 비교하는 방식으로 이루어졌던 것과는 달리, 본 발명에 따르면 먼저 사용자들 간의 유사도를 구한 뒤, 유사도에 기초하여 군집화를 수행함으로써 보다 정확한 유사도 기반의 군집화를 수행할 수 있다.The existing user clustering method first performs clustering of users and then compares the similarities between the derived clusters or compares the similarities among the users in the clusters. According to the present invention, first, After obtaining the similarity, clustering is performed based on the similarity, and more accurate clustering based on similarity can be performed.

본 발명은 기존의 군집 탐색 방법이 유사도 가중치 기반의 완전 네트워크에서 유의미한 네트워크 군집을 찾아낼 수 없었던 한계점을 극복할 수 있다. The present invention overcomes the limitations in which a conventional cluster search method can not find a meaningful network cluster in a complete network based on similarity weights.

본 발명은 유사도로 이루어진 네트워크에서 군집을 탐색함으로써 컨텐츠 추천 시스템에서 사용들이 가진 유사한 특성에 따라 군집화하는데 적용 및 응용이 가능하다.The present invention can be applied and applied to clustering according to similar characteristics of users in a content recommendation system by searching clusters in a network of similarity.

본 발명은 사용자에게 개인화된 컨텐츠를 추천해주기 위한 시스템을 구축할 때 주로 사용되는 기법 중 하나인 Collaborate Filtering 방법에서 서로 유사한 특성을 보여주는 사용자들을 군집화화는데 활용될 수 있다.
The present invention can be utilized to group users who have similar characteristics in the Collaborate Filtering method, which is one of the techniques that are mainly used in constructing a system for recommending personalized contents to users.

도 1 은 일 실시예에 따른 군집 구조 장치의 구성 요소를 설명하기 위한 블록도이다.
도 2는 일 실시예에 따른 군집 구조 장치의 군집 구조 방법을 설명하기 위한 흐름도이다.
도 3은 일 실시예에 따른 복수의 노드와 복수의 다른 노드에 대하여 양자간 네트워크로 모형화한 것을 나타낸 도면이다.
도 4는 일 실시예에 따른 노드간의 평균 쌍 유사도를 나타낸 도면이다.
도 5는 일 실시예에 따른 경쟁에서 복수의 노드에 대한 클러스터링을 나타낸 도면이다.
도 6은 일 실시예에 따른 특정 노드의 포인트 z가 각각의 복수의 다른 노드의 데이터에서 번갈아가며 제거되는 것을 나타낸 도면이다.
도 7은 일 실시예에 따른 수정된 모듈 및 복수의 노드의 유사도 네트워크의 모듈 구조를 나타낸 도면이다. 1 is a block diagram for explaining components of a community structure apparatus according to an embodiment.
2 is a flowchart illustrating a method of constructing a community structure according to an exemplary embodiment of the present invention.
FIG. 3 is a diagram illustrating a modeling of a plurality of nodes and a plurality of other nodes in a bilateral network according to an embodiment.
4 is a diagram illustrating an average pair similarity between nodes according to an exemplary embodiment of the present invention.
5 is a diagram illustrating clustering for multiple nodes in a competition according to one embodiment.
6 is a diagram illustrating that point z of a particular node in accordance with one embodiment is alternately removed from the data of each of the plurality of other nodes.
7 is a diagram illustrating a modified module and a module structure of a plurality of nodes' similarity networks according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.
Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1 은 일 실시예에 따른 군집 구조 장치의 구성 요소를 설명하기 위한 블록도이고, 도2는 일 실시예에 따른 군집 구조 장치의 군집 구조 방법을 설명하기 위한 흐름도이다.FIG. 1 is a block diagram for explaining components of a cluster structure device according to an embodiment. FIG. 2 is a flowchart illustrating a cluster structure method of a cluster structure device according to an embodiment.

군집 구조 장치(100)는 완전 네트워크에서 군집을 탐색하기 위한 것으로, 모델링부(110), 측정부(120) 및 탐색부(130)를 포함할 수 있다. 군집 구조 장치(100)의 구성요소들은 도 2의 군집 구조 방법이 포함하는 단계들(210 내지 230)을 수행하도록 군집 구조 장치(100)를 제어할 수 있다. The community structure apparatus 100 is for searching a community in a full network and may include a modeling unit 110, a measurement unit 120, and a search unit 130. The components of the community structure device 100 may control the community structure device 100 to perform the steps 210 to 230, which includes the community structure method of FIG.

단계(210)에서 모델링부(110)는 노드들은 완전 네트워크로 모델링할 수 있다. 모델링부(110)는 노드들을 가중치 기반의 양자간 완전 네트워크로 모델링할 수 있다. 이때, 완전 네트워크는 모든 노드들이 서로 완전하게 연결될 상태를 의미한다. 모델링부(110)는 복수 개의 노드(l, l은 자연수)와 복수 개의 다른 노드(r, r은 자연수)에 대하여, l*r의 양자간 인접 행렬의 형태로 주어지는 점수를 나타내는 가중치가 적용된 엣지를 양자간 네트워크로 모형화할 수 있다. 모델링부(110)는 완전 네트워크의 노드 클래스로의 단일-모드 프로젝션에 가중치가 적용되고, 가중치가 적용된 엣지를 통하여 노드간에 쌍 연관 강도의 유형을 나타낼 수 있다. 모델링부(110)는 복수 개의 노드(l, l은 자연수)에 기초한 코사인 유사도를 복수 개의 다른 노드에 적용함에 따라 단일 모드 프로젝션 네트워크의 복수 개의 노드의 개수* 복수 개의 노드의 개수의 유사도 가중치 인접 행렬을 정의할 수 있다. In step 210, the modeling unit 110 may model the nodes into a full network. The modeling unit 110 may model the nodes as a weight-based inter-quantum full network. At this time, the complete network means that all the nodes are completely connected to each other. The modeling unit 110 calculates the weight of the edge that is given a weight given as a form of a matrix of l * r with respect to a plurality of nodes (l and l are natural numbers) and a plurality of other nodes (r and r are natural numbers) Can be modeled as a bilateral network. The modeling unit 110 may weight the single-mode projection to the node class of the full network and may indicate the type of pairing strength between the nodes through the weighted edge. The modeling unit 110 calculates the number of nodes of the single mode projection network by applying the cosine similarity based on the plurality of nodes (l, l is a natural number) to a plurality of other nodes, Can be defined.

단계(220)에서 측정부(120)는 완전 네트워크에 포함된 노드들 간의 유사도를 측정할 수 있다. 측정부(120)는 전체의 노드에 기초하여 복수 개의 노드간의 평균 유사도를 계산할 수 있다. In step 220, the measurement unit 120 may measure the degree of similarity between the nodes included in the complete network. The measuring unit 120 may calculate an average degree of similarity between a plurality of nodes based on the entire nodes.

단계(230)에서 탐색부(130)는 측정된 유사도에 기반하여 완전 네트워크에서의 특정 군집을 탐색할 수 있다. 탐색부(130)는 노드들 간의 쌍 유사도 또는 관련성에 기반하여 노드의 군집을 식별함으로써 노드 분류를 수행하는 계층적 군집화를 수행할 수 있다. 탐색부(130)는 계층적 군집화가 완전 네트워크의 모듈 구조를 결정하는데 사용될 수 있다. In step 230, the search unit 130 may search for a specific cluster in the complete network based on the measured similarity. The search unit 130 may perform hierarchical clustering for performing node classification by identifying a cluster of nodes based on pair similarity or relevance between the nodes. The searcher 130 may use hierarchical clustering to determine the modular structure of the complete network.

도 3은 일 실시예에 따른 복수의 노드와 복수의 다른 노드에 대하여 양자간 네트워크로 모형화한 것을 나타낸 도면이다.FIG. 3 is a diagram illustrating a modeling of a plurality of nodes and a plurality of other nodes in a bilateral network according to an embodiment.

복수의 노드와 복수의 다른 노드에 대하여 양자간 네트워크로 모형화할 수 있다. 이에 대하여, 아래의 설명에서는, 경연 대회에서 복수의 심사위원(복수의 노드)과 복수의 참가자(복수의 다른 노드)에 대한 양자간 완전 네트워크로 모형화하는 방법을 예를 들어 설명하기로 한다. 네트워크(Network)는 생태학, 생물학, 사회 과학 등 다양한 분야에서 발견되는 많은 복잡한 시스템을 모델링하고 이해하는데 유용한 것으로 입증되었으며, 네트워크는 인터넷, 월드 와이드 웹 및 인용 네트워크와 같은 데이터와 정보에 밀접하게 연관되어 있다. 네트워크가 유익하게 활용 될 수 있도록 우리가 연구하는 복합 시스템의 클래스는 구성 시스템 간의 경쟁이 시스템의 기능과 진화를 위한 핵심적인 역할을 수행하는 경쟁 시스템을 의미한다. A plurality of nodes and a plurality of other nodes can be modeled by a bilateral network. On the other hand, in the following description, a method of modeling a plurality of judges (a plurality of nodes) and a plurality of participants (a plurality of other nodes) into a full bilateral network in a contest will be described as an example. Networks have proven useful in modeling and understanding many complex systems found in a variety of fields such as ecology, biology, social sciences, and networks are closely tied to data and information such as the Internet, the World Wide Web, and citation networks have. The class of complex systems that we study so that networks can be beneficially used means a competing system in which the competition between the constituent systems plays a key role in the function and evolution of the system.

심사위원과 참가자간 경쟁은 l(l은 자연수)명의 심사위원과 r(r은 자연수)명의 참가자에 대한 l*r차원의 양자간 인접 행렬의 형태로 주어진 점수를 나타내는 가중치가 적용된 엣지를 가진 양자간 네트워크로 모형화할 수 있다. Competition between a judge and a participant is a quantum with a weighted edge representing a given score in the form of a l * r adjacency matrix of l (r is a natural number) judges and r (r is a natural number) Can be modeled as an inter-network.

수학식 1:Equation (1)

예를 들면, 경쟁에서 17명의 심사위원과 10명의 참가자로 구성될 수 있다. 도 3(a)는 배심원 경연 대회에서 심사위원과 참가자에 대하여 양자간 네트워크 표현한 것을 나타낸 것이다. 이때, 가장자리 폭(무게)은 심사위원이 참가자에게 부여하는 점수를 나타낼 수 있다. 도 3(b)를 참고하면, 양자간 네트워크의 노드 클래스로의 단일-모드 프로젝션에 가중치가 적용된 것을 알 수 있다. 도 3(b)에 따르면, 네트워크의 심사위원 또는 참가자가 선택된 경우, 참가자의 단일-모드 프로젝션은 가중치가 부여된 완전 네트워크를 산출하며, 엣지 가중치는 주어진 점수에 따라 심사위원 간의 유사도를 나타낸다. 이때, 유사도로서 코사인 유사도를 사용한다.For example, it may consist of 17 judges and 10 participants in the competition. Figure 3 (a) shows a network representation of the judges and participants in a jury contest. At this time, the edge width (weight) may indicate the score the judges give to the participant. Referring to FIG. 3 (b), it can be seen that the weight is applied to the single-mode projection to the node class of the inter-network. According to FIG. 3 (b), if a judge or participant of the network is selected, the participant's single-mode projection produces a weighted complete network, and the edge weight represents the degree of similarity between the judges according to the given score. At this time, cosine similarity is used as the similarity.

엣지 가중치를 설정하여 노드 간의 쌍(pairwise) 연관 강도의 유형을 나타낼 수 있다. 본 발명은 심사위원의 점수에 기초한 코사인 유사도(

)를 다음과 같이 정의된 참가자에게 적용할 수 있다. Edge weights can be set to indicate the type of pairwise association strength between nodes. The present invention is based on the score of the judges'

) May be applied to a participant defined as follows.

수학식 2:Equation 2:

이는 단일-모드 프로젝션 네트워크의 17x17 양의 (유사도) 가중치 인접 행렬

을 정의할 수 있다. This is a 17x17 positive (similarity) weighted adjacency matrix of a single-mode projection network

Can be defined.

도 4는 일 실시예에 따른 노드간의 평균 쌍 유사도를 나타낸 도면이다.4 is a diagram illustrating an average pair similarity between nodes according to an exemplary embodiment of the present invention.

경쟁 네트워크에서의 비정형 노드를 결정하는 방법을 설명하기로 한다. 도 4(a)에서 전체 데이터를 사용하여 심사위원과 다른 심사위원과의 평균 유사도를 도시한 것이다. 도4(a)는

을 가지는 Yundi가 전체적으로 다른 심사위원들과 가장 유사하며,

을 가지는 Entremont가 다른 심사위원들과 가장 덜 유사하다는 것을 보여준다. 이러한 결과에서 높은 유사도는 Yundi가 가장 전형적이고 평균적인 심사위원이라는 것을 나타내므로 그가 특히 흥미롭지 않다는 것을 의미한다. 대조적으로, Entremont는 잠재적으로 가장 흥미롭고 비정형적인 사례이다. Cho의 퍼포먼스에 대해 그가 낮은 점수를 부여한 점을 감안할 때, 비정형성이 얼마나 많은 영향을 받았는지 궁금하게 한다. 물론 다른 심사위원들과 다르다는 것은 그 자체로 문제가 되지 않으며, 창조적인 사업에서는 다른 사람과의 차이가 권장되어야 한다. 그러나, 비정형성이 불일치를 시사한다면, 이러한 비정형성이 야기할 수 있는 문제에 유의해야 한다. 따라서, 비정형성의 영향을 보기 위해, 데이터에서 Cho를 제거한 후 동일한 분석을 수행함에 따른 결과는 도 4(b)에 나타나 있다. Entremont는 유사도가 현저히 증가하여 현재 7위를 차지하였다. 이것은 Cho에 대한 그의 점수가 도 4(a)에서 보이는 그의 명백한 비정형성 뒤에 매우 강한 요소였음을 설득력 있게 보여준다.A method for determining an unstructured node in a competitive network will be described. Figure 4 (a) shows the average similarity between judges and other judges using the entire data. 4 (a)

Yundi is most similar to other judges overall,

Shows that Entremont with the least similarity to other judges. The high degree of similarity in these results means that Yundi is the most typical and average judge, so he is not particularly interesting. In contrast, Entremont is potentially the most interesting and atypical example. Given his low score for Cho's performance, I wonder how much influence non-regularization has had on him. Of course, being different from other judges is not a problem in itself, and in a creative business, differences from others should be encouraged. However, if the non-formation indicates a discrepancy, it should be noted that such non-formation may cause problems. Therefore, in order to see the influence of the atypicality, the result of performing the same analysis after removing Cho from the data is shown in FIG. 4 (b). In Entremont, the degree of similarity was significantly increased to 7th place. This convincingly demonstrates that his score for Cho was a very strong factor behind his apparent irregularity as seen in Fig. 4 (a).

도 4가 단일의 비특징적인 점수의 영향을 어느 정도 증명한다 하더라도, 각 심사위원의 엣지 가중치(유사도)를 평균하면 네트워크 구조에 대한 정보가 완전히 손실되어 문제에 대해 더 자세히 이해할 수 있으며, 특히, 개별 심사위원 간의 관계와 관련하여 더욱 그러하다. 이에 따라 네트워크 분석을 위한 다양한 분석 및 계산 방법 중에서 계층적 군집화를 수행할 수 있다. 계층적 군집화는 객체 들간의 쌍 유사도 또는 관련성에 기반하여 객체의 군집을 식별함으로써 노드 분류에 가장 많이 사용된다. 계측적 군집화는 각 개체가 고유한 그룹에서 시작하여 모든 단일 그룹에 이르기까지 객체 그룹의 계층 구조를 생성할 수 있다. 이와 같이 발견된 계층 구조는 도 5에 표시된 계통도를 통해 시작으로 표현될 수 있으며, 계통도는 평균 결합과 함께 응집 군집화를 사용하는 수학식 2의 코사인 유사도에 기반하여 경쟁에서의 심사위원에 대해 생성될 수 있다. Although FIG. 4 demonstrates to some extent the effect of a single non-characteristic score, averaging the edge weights (similarity) of each judge will result in a complete loss of information about the network structure and a better understanding of the problem, This is more so in relation to the relationship between individual judges. Accordingly, hierarchical clustering among various analysis and calculation methods for network analysis can be performed. Hierarchical clustering is most often used for node classification by identifying clusters of objects based on pair similarity or relevance between objects. Instrumental clustering can create a hierarchy of object groups, starting from a unique group of individuals, to every single group. The hierarchy thus found can be represented as a start through the diagram shown in FIG. 5, and the diagram is generated for the judges in the competition based on the cosine similarity of Equation 2 using cohesion clustering with mean combining .

군집에 초점을 맞추기 전에 먼저, 계통도에 처음으로 합류하는 주어진 노드인 레벨 z를 계통도로부터 추출한 유용한 양에 대해 논의한다. z의 해석은 간단하다. 예를 들면, 기 설정된 기준 이하의 z를 가진 노드는 초기에 계통도로 합류하고 다른 노드와 높은 유사도를 의미하는 반면, 기 설정된 기준 이상의 z는 앞서 설명한 것과 반대로, 초기에 계통도로 합류하지 않으며, 다른 노드와 낮은 유사도를 가지는 것을 의미한다. Entremont의 z는 도 4와 상당히 일치한다. Entremont는 전체 데이터 세트에서 z=16(17명의 심사위원에서 가능한 최대값)으로 계통도에 마지막으로 참여하는 반면, Cho가 제거될 때에는 z=3으로서 가장 초기 참가자 중 하나이다. 두 경우의 계통도는 그림 5(a) 및 5(b)에 도시되어 있다. 이것이 네트워크의 계층적 구조를 유지하면서 각 노드의 비정형성에 대해 거의 동일한 정보를 시각화할 수 있다. z는 노드 비정형성을 특성화하기에 더 간단하면서도 더 유용한 양이다. 다른 참가자가 Entremont와 비슷한 관계를 가지는지 확인하기 위해, 데이터에서 참가자를 제거하고 Entremont의 z를 측정하여 위와 같은 과정을 반복하였고, 그 결과는 그림 6(a)에 나타나 있다. 다른 참가자는 Entremont의 z와 비슷한 효과를 나타내지 못하였고, Cho의 퍼포먼스에 대해 Entremont이 부여한 점수의 비특징적 성격을 확인할 수 있다. 예를 들면, Cho의 낮은 점수의 원인에 대한 가장 보편화된 추측은 Entremont가 인종차별을 받았다고 추측할 수 있다.Before focusing on the clusters, we first discuss the useful amount extracted from the hierarchy, level z, the given node joining the system for the first time. The interpretation of z is simple. For example, a node with z below a predetermined criterion initially joins the scheme and high degree of similarity with other nodes, whereas z above a predetermined criterion does not initially join the scheme as described above, And has a low degree of similarity with the node. Entremont's z is quite consistent with Fig. Entremont is one of the earliest participants as z = 3 when Cho is removed, while Entremont is the last participant in the schematic to z = 16 (the maximum possible of 17 judges) in the entire dataset. The schematic diagrams of both cases are shown in Figures 5 (a) and 5 (b). This can visualize almost the same information about the atypicality of each node while maintaining the hierarchical structure of the network. z is a simpler but more useful quantity to characterize node non-shaping. To confirm that other participants had a similar relationship with Entremont, the participants were removed from the data and Entremont's z was measured and the above procedure was repeated. The results are shown in Figure 6 (a). Other participants did not demonstrate a similar effect to Entremont's z, and the uncharacteristic character of Entremont's scores on Cho's performance can be seen. For example, one can speculate that Entremont was racially discriminated by the most popular guesses about the causes of Cho's low scores.

참가자의 특정 그룹(비 백인)에 대한 편향성을 편리하게 확인하기 위하여 z를 사용하는 유사도 분석을 수행할 수 있다. 참가자들을 다음과 같이 비 백인과 백인으로 분류하고, 분류된 두 분류 모두에 Cho를 추가하여 두 그룹으로 분류할 수 있다. 이때, 참가자들의 인종은 이름에 포함된 성에서 유추될 수 있다. You can perform a similarity analysis using z to conveniently identify bias for a particular group of participants (non-white). Participants can be categorized as non-white and white, as follows, and can be divided into two groups by adding Cho to both categories. At this time, the race of the participants can be deduced from the sex included in the name.

-비 백인(5): Cho, Kobayashi, Liu, Lu, and Yang;- Non-White (5): Cho, Kobayashi, Liu, Lu, and Yang;

-백인과 Cho(6): Cho, Hamelin, Jurinic, Osokins, Shiskin, and Szymon.- White and Cho (6): Cho, Hamelin, Jurinic, Osokins, Shiskin, and Szymon.

만약 Entremont가 진정으로 두 인종 그룹을 서로 다르게 대우했다면, Cho에 대한 점수는 각 그룹에서 상당히 다르게 나타날 것이다. 도 6(a), 6(b) 및 6(c)와 비슷한 그림을 도시할 수 있다. Cho가 두 가지 경우의 데이터에서 제외되었을 때 Entrement의 z가 감소하는 것을 볼 수 있다. 이러한 점은 Cho가 자신의 인종 이외의 다른 이유로 지목되었을 가능성이 있음을 나타낸다.If Entremont really treated the two ethnic groups differently, the score for Cho would be quite different in each group. 6 (a), 6 (b), and 6 (c). When Cho is excluded from the data for both cases, we can see that the z of Entrement decreases. This indicates that Cho may have been identified for reasons other than his race.

계층형 군집화는 네트워크의 모듈 구조를 결정하는데 가장 자주 사용된다. 이 방법은 종종 레벨의 계통도에서 "컷(cut)"을 구성함으로써 달성될 수 있다. 컷 레벨을 결정하는 일반적인 방법은 아래의 수학식 3과 같은 소위 모듈 구조 Q를 최대화하는 것이다.Hierarchical clustering is most often used to determine the modular structure of a network. This method can often be achieved by constructing a "cut" in the hierarchy of levels. The general method of determining the cut level is to maximize the so-called module structure Q as shown in Equation 3 below.

수학식 3:Equation (3)

수학식 3에서 m은 엣지의 수이고, C_i, C_j는 노드 i가 속한 모듈이며,

는 Kronecker 델타이다. 괄호 안의 요소는 노드 쌍(간단한 그래프에서 0 또는 1) 사이의 실제 엣지 수와 노드의 각도를 기반으로 하는 임의의 기대 값 사이의 차이이다. 도3(b)의 단일 모드 프로젝션 네트워크에 대해 이 양을 일반화하며, 여기서 엣지는 가중치가 적용된다. 선행 상수

을 무시하는 수학식 3의 직접적인 일반화는 아래의 수학식 4와 같이 나타낼 수 있다.In Equation (3), m is the number of edges, C _i and C _j are modules to which node i belongs,

Is Kronecker delta. The elements in parentheses are the differences between the actual number of edges between the node pair (0 or 1 in the simple graph) and any expected value based on the angle of the node. This amount is generalized for the single mode projection network of Figure 3 (b), where the edges are weighted. Leading constant

The direct generalization of Equation (3) can be expressed as Equation (4) below.

수학식 4:Equation 4:

수학식 4에서

는 도 3(a)의 양자간 완전 네트워크에서 점수(엣지 가중치)를 무작위로 혼합하여 예상되는 유사도를 의미한다. 코사인 유사도(수학식 2)를 사용하면 분석적으로 계산할 수 있다. 코사인 유사도는

와

의 요소의 모든 순열(permutation)에 대한 평균과 동일하다. 한편,

의 요소를 순열화하는 것으로도 충분하다. In Equation 4,

Represents the degree of similarity expected by randomly mixing scores (edge weights) in the complete inter-net network of FIG. 3 (a). Using the cosine similarity (Equation 2), it can be calculated analytically. The cosine similarity is

Wow

Is the same as the mean for all permutations of the elements of. Meanwhile,

It is sufficient that the elements of " net "

수학식 5: Equation 5:

r!을 벗어난

의 k번째 순열인

에 의해 가능한 순열에서

는

의 원소들의 합이다. 이 값을 수학식 4에 삽입하고, 네트워크에 대해 상기 값을 극대화하려고 할 때, 최대 Q'는 모든 심사위원을 포함하는 단일 모듈을 산출할 수 있다. 이러한 결과는 피갓수(summand)

의 성질에 기인한 것으로, 대다수의 노드 쌍에 대해 피갓수(Entremont가 관련될 때 조차도)는 양의 값을 가지므로 Q'에 대한 모든 노드(i, j)에 대해

을 갖는 것이 양의 값과 큰 값이 되기 위하여 유리하다. 다시 말해서, 모든 심사위원은 단일의 포괄적인 모듈에 속한다. 이와는 대조적으로 Q(수학식 3)가 간단한 네트워크에서 잘 동작하는 이유는 대부분의 피갓수가 음수이기 때문이다. 다시 말해서, 간단한 네트워크에서 대부분의 노드 쌍에 대해

이고,

은 항상 양의 값이다. 단일 그룹에 속한 모든 노드를 포함하는 것이 최적은 아니다. 이에 따라 가중치가 적용된 완전 네트워크의 보다 합리적인 모듈 특성을 식별하기 위하여 음수 항에 적절한 수를 산출하는 Q'를 수정해야 한다. 이를 위하여, 각 피갓수에서 공통의 양수 값을 감산하는 것으로서, q'의 평균값인

를 제안할 수 있다.out of r!

The kth permutation of

From possible permutations by

The

. When inserting this value into Equation 4 and trying to maximize the value for the network, the maximum Q 'can yield a single module that includes all the judges. These results are summarized as follows:

(Even when Entremont is involved) for a majority of the node pairs has a positive value, so that for all nodes (i, j) for Q '

It is advantageous to have a positive value and a large value. In other words, all judges belong to a single, comprehensive module. In contrast, Q (Equation 3) works well in a simple network because most of the contrived numbers are negative. In other words, for most node pairs in a simple network

ego,

Is always a positive value. It is not optimal to include all nodes belonging to a single group. Thus, in order to identify more reasonable modular characteristics of the weighted full network, Q ', which yields an appropriate number for the negative term, must be modified. To do this, subtracting a common positive value from each blooded bean, the average value of q '

. &Lt; / RTI >

수학식 6:Equation (6)

는 데이터와의 실제 유사도의 평균이고,

는 수학식 5로부터의 쌍 무작위 기대이고,

는 쌍 무작위 기대의 평균이다. 수정된 모듈 방식을 아래의 수학식 7로 나타낼 수 있다.

Is the average of the actual similarities with the data,

Is the paired random expectation from equation (5)

Is the average of pair random expectations. The modified modularity can be expressed by Equation (7) below.

수학식 7:Equation (7)

수학식 7은 계통도의 두 extremes에서 사라지는 유용한 특성을 가지므로(즉, 모든 노드가 분리되거나 단일의 모듈을 형성하는) 가장 단순하거나 비정보적인 경우를 자연스럽게 피할 수 있게 한다. 또한 이 방법의 대형 네트워크에의 적용 가능성을 보여주기 위하여 알고리즘이 2차 메모리와 시간에서 실행된다는 점에 주목한다. r노드를 군집화하는 경우, 네트워크에 대한

메모리 및 쌍 유사도에 대한

이다. 유사도 계산에는

와

에 대해

시간이 필요하다. 평균 유사도와 계층적 군집화는

시간이 소요된다. 노드들 사이에 정의된 유사도만을 필요하므로, 유사도를 분리된 네트워크에도 적용시킬 수 있다. 두 개의 분리된 군집에 속한 두 노드 사이의 코사인 유사도는 단순이 0이 될 것이다.Equation (7) allows us to naturally avoid the simplest or non-informative case where all nodes have useful properties that disappear from both extremes of the hierarchy (i.e., all nodes are separated or form a single module). It is also noted that the algorithm runs in time with the secondary memory to demonstrate its applicability to large networks of this method. If you are clustering nodes,

For memory and pair similarity

to be. The similarity calculation

Wow

About

Time is needed. Average similarity and hierarchical clustering

It takes time. Since only the degree of similarity defined between nodes is needed, the similarity can be applied to a separate network. The cosine similarity between two nodes in two separate clusters will be simple.

이제 전체 데이터 세트를 가진 7(a)와 Cho가 데이터에서 제거된 7(b)의 두 가지 경우에 대하여 도7에 도시된 계통도를 다시 가로지르면서 Q''를 도시한다. 도 7(c)에 도시된 바와 같이, Cho가 포함된 전체 데이터에서 최대 Q''는 z=14인 레벨에서 발생하고 3개의 모듈을 산출한다. Cho를 제거하면 최대 Q''는 z=15에서 발생하여 2개의 모듈을 산출한다(도 7(d)). 이러한 두 가지 솔루션의 가장 큰 차이점은 Entremont의 1-노드 모듈이 전체 데이터에 존재한다는 것이다. 그러나, Entremont가 없으면 Q''-최대 모듈 구조는 거의 동일하다. 이러한 점은 몇 가지 편향된 엣지에 의해 초래될 수 있는 잠재적인 위험을 증명하는 중요한 문제를 지적하며, 우리는 여기에 중점을 둔다. 두 최적해 사이의 차이는 훨씬 작은 (

and

) 도 7(b)와 비교할 때, 다른 모든 가능성(

의 상대적 차이에 대해 두 번째 최적해와 이것 사이에

)을 가려 Q''-최대해(z=14)가 강건함에서 우위를 점하는 것을 도 7(a)에서 확인할 수 있다. We now show Q '' again across the scheme shown in FIG. 7 for the two cases of 7 (a) with the entire data set and 7 (b) with Cho removed from the data. As shown in Fig. 7 (c), in the entire data including Cho, the maximum Q " occurs at a level of z = 14 and yields three modules. When Cho is removed, the maximum Q '' occurs at z = 15, yielding two modules (Fig. 7 (d)). The main difference between these two solutions is that Entremont's 1-node module is present in the entire data. However, without Entremont, the Q '' - maximum module structure is nearly identical. This points to an important issue that demonstrates the potential risks that may be caused by some biased edges, and we focus here. The difference between the two optimal solutions is much smaller (

and

) Compared with Fig. 7 (b), all other possibilities (

Between the second optimal solution and the relative difference between

(Z = 14) is dominant in the robustness by inserting the Q '' -maximum solution (z = 14), as shown in FIG. 7 (a).

또한, 도 7(b)에는 비슷한 Q''=(z=13, 12, 11)를 가진 적어도 세 가지 다른 해가 존재한다. 이때, 해들 사이의 Q''의 작은 차이를 고려할 때, 모듈화 또는 상호 군집화 방법의 약간 다른 정의를 사용할 수도 있다. 이들 중 일부 또는 다른 비교 가능한 구성이 최적이라고 식별되었을 수도 있지만, 이러한 가능성은 도 7 (a)에 전혀 나타나지 않는다. 단일의 비특성적인 엣지(모든 엣지에 대해

만큼의 부분을 차지하는)로 인하여 네트워크 속성에 대해 상당히 다른 추론이 발생할 수도 있음을 의미한다. Q''의 변경에 대한 또 다른 예가 도 7(e) 및 7(f)에 도시되어 있다.Also, there are at least three different solutions with similar Q '''= (z = 13, 12, 11) in Figure 7 (b). At this time, a slightly different definition of the modularity or interclustering method may be used, given the small difference in Q '''between solutions. Although some of these or other comparable configurations may have been identified as optimal, this possibility does not appear at all in Figure 7 (a). A single non-characteristic edge (for all edges

Of the network attributes) may result in considerably different inferences about network attributes. Another example of a change in Q " is shown in Figs. 7 (e) and 7 (f).

본 발명은 심사위원-참가자 경쟁에 대한 네트워크 모델을 제시하였으며, 공정성과 편견의 문제를 이해하는 방법을 제시하였다. 첫 번째로, 가장 상이한 점수 또는 심사위원을 식별하여 가장 기이하거나 이례적인 점수를 확인하는 것이다. 다른 심사위원들과 개별 심사위원의 평균 유사도가 쌍 유사도의 직접적인 지표를 제공하는 반면, 본 발명에 따르면, 심사위원의 비정형성을 그래픽으로 판별하기 위하여 군집화 계통도를 사용하는 것은 심사위원을 보다 직관적으로 식별할 수 있음을 보여주기 위하여 진행한다. The present invention presents a network model for judges-participant competition and suggests ways to understand the issue of fairness and prejudice. First, identify the most odd or exceptional scores by identifying the most different scores or judges. While the average similarity of other judges and individual judges provides a direct indicator of pair similarity, according to the present invention, the use of clustering schemes to graphically identify judges' non-formation of judges makes the judges more intuitive We will proceed to show that we can identify.

가중치 기반 완전 네트워크에 적합한 형태로 모듈성의 표준 정의를 확장하여 분석한 결과, 비정상적으로 편향된 엣지의 작은 부분으로 인하여 야기되는 위험이 상당히 뚜렷한 것으로 나타났다. Entremont에서 Cho에 이르기까지 단일 엣지가 네트워크 유추에 대해 큰 효과를 갖는 것으로 나타났다. 이에 따라 차선의 커뮤니티 구조를 찾도록 유도하였다. 둘째로, 그것은 사실상 몇 가지 비교 가능하고 합리적인 해가 존재하는 동안 견고한 것처럼 보였다. 이것이 우리의 주요 발견인 편향된 단일 엣지의 큰 효과이다.As a result of analyzing the standard definition of modularity in a form suitable for a weight - based perfect network, the risk posed by a small part of an abnormally deflected edge is remarkably apparent. From Entremont to Cho, a single edge has a great effect on network analogy. This led to the search for the next-generation community structure. Second, it seemed to be robust during the existence of virtually any comparable and reasonable solution. This is a major effect of our biased single edge, which is our main discovery.

본 발명은 개인화된 컨텐츠 추천 시스템은 사용자에게 컨텐츠를 서비스하는 기업들이 이미 필수적으로 도입했거나 도입하고 있는 중인 기술로서 앞으로도 사용자에 맞춘 컨텐츠를 제공하고자 하는 노력이 계속될 것으로 예상되므로 시장성을 갖출 수 있다. The personalized contents recommendation system can be marketed because it is expected that efforts to provide content tailored to the user will continue in the future as technologies that companies that have already introduced or are already introducing to the users by the service providers of the contents.

본 발명은 사용자에게 컨텐츠를 제공하는 것을 주 목적으로 삼고 있는 다양한 분야에서 이미 개인화된 추천 시스템을 제공하고 있으며, 앞으로도 맞춤형 추천에 대한 사용자들의 욕구가 증가할 것으로 예상되므로 본 발명의 기술을 활용하여 추천 시스템의 사용자 군집화 도구를 개발하는 등의 방식으로 사업화가 이루어질 수 있을 것으로 판단된다. The present invention provides a personalized recommendation system in various fields mainly for providing contents to users, and it is expected that users' desire for customized recommendation will increase in the future. Therefore, It can be commercialized by developing a user grouping tool of the system.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method for deriving a cluster structure,
Modeling the nodes into an inter-bilayer perfect network, wherein the complete network means that all nodes are completely connected to each other;
Measuring a degree of similarity between nodes included in the complete network; And
Searching for a specific cluster in the full network based on the measured similarity
/ RTI >

The method according to claim 1,
Wherein modeling the nodes into a full bilateral network comprises:
(1, l is a natural number) and a plurality of other nodes (r and r are a natural number), and the inter-interfering perfect network is represented by l * r A step given in the form of a bilaterally adjacent matrix
/ RTI >

3. The method of claim 2,
Wherein modeling the nodes into a full bilateral network comprises:
Wherein a weight is applied to the single-mode projection to the node class of the full network and the type of pair association strength between the nodes through the weighted edge
/ RTI >

3. The method of claim 2,
Wherein modeling the nodes into a full bilateral network comprises:
The number of nodes of the single mode projection network by applying a cosine similarity based on the plurality of nodes (l, l is a natural number) to the plurality of other nodes, and defining a similarity weighted adjacency matrix of the number of the plurality of nodes Step
/ RTI >

The method according to claim 1,
Wherein the step of measuring the degree of similarity between the nodes included in the complete network comprises:
Calculating the average similarity between the plurality of nodes based on the entire node
/ RTI >

The method according to claim 1,
Wherein the step of searching for a specific cluster in the full network based on the measured similarity comprises:
Performing hierarchical clustering to perform node classification by identifying cluster of nodes based on pair similarity or relevance between the nodes
/ RTI >

The method according to claim 6,
Wherein the step of searching for a specific cluster in the full network based on the measured similarity comprises:
Wherein the hierarchical clustering is used to determine a modular structure of the complete network
/ RTI >

A computer program stored in a storage medium for executing a method for deriving a cluster structure,
Modeling the nodes into an inter-bilayer perfect network, wherein the complete network means that all nodes are completely connected to each other;
Measuring a degree of similarity between nodes included in the complete network; And
Searching for a specific cluster in the full network based on the measured similarity
&Lt; / RTI >

An apparatus for deriving a community structure,
A modeling unit for modeling the nodes into a full bilateral network, said complete network means that all nodes are completely connected to each other;
A measurement unit for measuring a degree of similarity between nodes included in the complete network; And
A search unit for searching a specific cluster in the complete network based on the measured similarity;
Wherein the cluster structure is derived from a population structure.

10. The method of claim 9,
The modeling unit,
(1, l is a natural number) and a plurality of other nodes (r and r are a natural number), and the inter-interfering perfect network is represented by l * r A step given in the form of a bilaterally adjacent matrix
Wherein the apparatus is adapted to derive the community structure.

11. The method of claim 10,
The modeling unit,
A weight is applied to the single-mode projection to the node class of the full network, and the weight is applied across the edge to indicate the type of pair-
Wherein the apparatus is adapted to derive the community structure.

11. The method of claim 10,
The modeling unit,
The number of nodes of the single mode projection network by applying a cosine similarity based on the plurality of nodes (l, l is a natural number) to the plurality of other nodes, and defining a similarity weighted adjacency matrix of the number of the plurality of nodes doing
Wherein the apparatus is adapted to derive the community structure.

10. The method of claim 9,
Wherein the measuring unit comprises:
The average similarity between a plurality of nodes is calculated based on the entire nodes
Wherein the apparatus is adapted to derive the community structure.

10. The method of claim 9,
The searching unit searches,
Hierarchical clustering is performed to perform node classification by identifying clusters of nodes based on pair similarity or relevance between the nodes
Wherein the apparatus is adapted to derive the community structure.

15. The method of claim 14,
The searching unit searches,
Wherein the hierarchical clustering is used to determine a modular structure of the complete network
Wherein the apparatus is adapted to derive the community structure.