KR101089294B1

KR101089294B1 - Method and apparatus for mapping data in structured peer-to-peer network

Info

Publication number: KR101089294B1
Application number: KR1020100012016A
Authority: KR
Inventors: 이영희; 박세형; 김도현; 황수안퉁
Original assignee: 한국과학기술원
Priority date: 2010-02-09
Filing date: 2010-02-09
Publication date: 2011-12-05
Also published as: KR20110092540A

Abstract

본 발명은 구조적 Ｐ2Ｐ 네트워크를 위한 데이터 매핑 방법 및 장치에 관한 것으로, 산술 매핑 기법을 사용하여 사전적 순서를 파괴하지 않으므로 복잡한 쿼리를 지원할 수 있음은 물론이고, 심볼들의 출현 빈도 확률을 알고 있는 경우에는 데이터가 균등하지 않은 경우에도 데이터를 키로 균일하게 매핑하므로 별도의 로드밸런싱을 하지 않거나 그 오버헤드를 최소화 할 수 있는 이점이 있다.The present invention relates to a data mapping method and apparatus for a structured P2P network, and since it does not destroy a dictionary order using an arithmetic mapping technique, it can support a complex query and, if the probability of occurrence of symbols is known, Even if the data is not uniform, the data is uniformly mapped to the key, so there is an advantage of not performing separate load balancing or minimizing the overhead.

Description

METHOD AND APPARATUS FOR MAPPING DATA IN STRUCTURED PEER-TO-PEER NETWORK}

본 발명은 구조적 P2P 네트워크를 위한 데이터 매핑 방법 및 장치에 관한 것으로서, 더욱 상세하게는 구조적 P2P 네트워크에서 복잡한 쿼리(query)를 지원할 수 있는 데이터 매핑 방법 및 장치에 관한 것이다.
The present invention relates to a data mapping method and apparatus for a structured P2P network, and more particularly, to a data mapping method and apparatus capable of supporting a complex query in a structured P2P network.

P2P(Peer-to-Peer)는 응용 계층에 존재하는 오버레이 네트워크 서비스이다. 즉 응용 계층에 새로운 커뮤니티를 생성하고 커뮤니티에 있는 노드들 간에 논리적 관계를 맺으며, 구하고자 하는 리소스(자원)를 찾는 과정에서 쿼리에 라우팅을 제공하는 구조를 갖는다.Peer-to-Peer (P2P) is an overlay network service that exists at the application layer. In other words, it creates a new community in the application layer, establishes a logical relationship between the nodes in the community, and provides routing to queries in the process of finding a resource.

P2P 네트워크는 분산 환경에서 노드들 간의 연결성을 제공하기 위하여 현재까지 4개의 상용 모델이 나왔으며, 라우팅 정보의 구조를 가지고 구조적 P2P와 비구조적 P2P로 구분한다. 비구조적 P2P에는 중앙 집중형 P2P 네트워크, 순수한 분산 P2P 네트워크, 하이브리드 P2P 네트워크 등이 있으며, 구조적 P2P에는 DHT(Distributed Hash Table) 기반 P2P 네트워크 등이 있다.P2P networks have four commercial models to provide connectivity between nodes in a distributed environment. The P2P network has a structure of routing information and is classified into structural P2P and unstructured P2P. Unstructured P2P includes centralized P2P networks, purely distributed P2P networks, and hybrid P2P networks, while structured P2P includes distributed Hash Table (DHT) based P2P networks.

이 중에서 구조적 P2P 네트워크는 데이터를 노드(node)들에 매핑할 때에 랜덤 해쉬 기법(randomizing hashing function)을 가장 널리 사용한다. 이러한 랜덤 해쉬 기법은 데이터 검색을 효과적으로 제공하고 로드밸런싱(load-balancing)의 효과도 뛰어나지만 데이터의 사전적 순서(lexicographic order)를 파괴함으로써 범위 쿼리(range query), 개략 쿼리(inexact query), 와일드카드 쿼리(wildcard query) 등과 같은 복잡한 쿼리의 지원이 불가능해진다.Among them, the structured P2P network uses the randomizing hashing function most widely when mapping data to nodes. This random hash technique provides efficient data retrieval and load-balancing, but it destroys the lexicographic order of the data so that it can range, query, and query wild. Complex queries such as wildcard queries are not supported.

도 1은 종래 기술에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치의 구성도이다.1 is a block diagram of a data mapping apparatus for a structured P2P network according to the prior art.

이에 나타낸 바와 같이 종래의 데이터 매핑 장치는, 데이터 스페이스(data space)를 입력받는 데이터 입력부(10), 랜덤 해쉬를 사용해서 데이터 스페이스를 특정한 길이의 키 스페이스(key space)로 변환하는 해쉬 매핑부(20), 해쉬 매핑부(20)에 의해 변환된 키 스페이스를 구조적 P2P 네트워크(40)의 피어 노드들(41, 43, 45)에게 배분하는 키 배분부(30) 등을 포함한다.As described above, the conventional data mapping apparatus includes a data input unit 10 that receives a data space, and a hash mapping unit that converts the data space into a key space having a specific length using a random hash. 20) a key distribution unit 30 for distributing the key space converted by the hash mapping unit 20 to the peer nodes 41, 43, 45 of the structural P2P network 40.

이러한 종래 기술에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치에 의하면, 해쉬 매핑부(20)가 데이터 스페이스를 키 스페이스로 매핑함에 있어 다음과 같은 기법을 사용한다.According to the data mapping apparatus for the structured P2P network according to the prior art, the hash mapping unit 20 uses the following technique in mapping the data space to the key space.

매핑 기법 f_raw는 오브젝트 X = X ₁ X ₂ X ₃ …X _m ; X ∈S, X _k ∈ A일 때에 다음의 수학식 1과 같이 정의된다.The mapping technique f _raw is the object X = X ₁ X ₂ X ₃ . X _m ; X ∈ S , X _k ∈ In the case of A , it is defined as in Equation 1 below.

f _raw (X)는 0(포함)과 1(불포함) 사이에 심볼 X의 인덱스가 되는 실수이며, N은 A의 기수성(cardinality)을 의미한다. f _raw (X)는 다시 다음의 수학식 2와 같이 정의할 수 있다. f _raw ( X ) is a real number that is the index of the symbol X between 0 (inclusive) and 1 (not included), where N is the cardinality of A. f _raw ( X ) can be defined as Equation 2 below.

f _raw (X)가 X의 키로 사용될 경우에 S의 사전적 순서를 보존하므로 복잡한 쿼리를 지원할 수 있다. 하지만 이러한 매핑 기법은 심볼의 사용된 횟수가 동일하지 않을 경우에 키 스페이스가 기울게 된다. 그래서 데이터의 분포가 균일하지 않으면 특정 노드에 데이터가 몰리게 되는 현상이 발생하며, 이로 인하여 별도의 로드밸런싱 메카니즘을 필요로 한다. 예컨대, 노드들의 로드(load)를 샘플링하여 로드가 적은(light) 노드들이 노드가 많은(heavy) 노드의 위치로 탈퇴 및 가입(leave and join)하는 로드밸런싱 기법(scheme)을 사용한다. 하지만 이런 방식의 로드밸런싱은 노드의 탈퇴 및 가입의 횟수를 증가시키므로 네트워크에 오버헤드가 크다. _raw f (X) is preserved in the lexical order of the S key, when used in the X, so it can support the complex query. However, this mapping technique tilts the key space when the number of symbols used is not the same. Therefore, if the distribution of data is not uniform, data may be collected at a specific node, which requires a separate load balancing mechanism. For example, a load balancing scheme is used in which loads of nodes are sampled so that light nodes leave and join nodes of heavy nodes. However, this type of load balancing increases the number of nodes leaving and joining the network, which puts a lot of overhead on the network.

한편, 현재 복잡한 쿼리를 지원하기 위해서는 Raw라는 매핑 기법을 주로 사용하고 있다. 그러나 이러한 Raw 매핑 기법은 데이터의 사전적 순서는 파괴하지 않으나 데이터의 분포가 비대칭됨(skew)에 따라 심각한 로드밸런싱 문제를 일으킬 수 있다.
On the other hand, raw mapping is mainly used to support complex queries. However, this raw mapping technique does not destroy the dictionary order of the data, but it can cause serious load balancing problems as the distribution of the data is skewed.

본 발명은 전술한 바와 같은 종래 기술의 문제점을 해결하기 위해 제안한 것으로서, 복잡한 쿼리를 지원하면서 데이터의 분포가 비대칭한 상황에서도 로드밸런싱 문제를 일으키지 않는 인덱싱 기술(indexing scheme)을 제공한다.
The present invention is proposed to solve the problems of the prior art as described above, and provides an indexing scheme that supports complex queries and does not cause load balancing problems even in asymmetrical distribution of data.

본 발명의 제 1 관점으로서 구조적 Ｐ2Ｐ 네트워크를 위한 데이터 매핑 장치는, 데이터 스페이스를 입력받는 데이터 입력부와, 산술 매핑 기법을 사용해서 상기 데이터 스페이스를 키 스페이스로 변환하는 산술 매핑부와, 상기 산술 매핑부에 의해 변환된 상기 키 스페이스를 구조적 P2P 네트워크의 피어 노드들에게 배분하는 키 배분부를 포함할 수 있다.As a first aspect of the present invention, a data mapping apparatus for a structured P2P network includes a data input unit for receiving a data space, an arithmetic mapping unit for converting the data space into a key space using an arithmetic mapping technique, and the arithmetic mapping unit. It may include a key distribution unit for distributing the key space converted by the to the peer nodes of the structural P2P network.

여기서, 상기 산술 매핑부는, 상기 데이터 스페이스를 숫자 값(numeric value)을 가지는 키 스페이스로 변환할 수 있다.Here, the arithmetic mapping unit may convert the data space into a key space having a numeric value.

상기 산술 매핑부는, 상기 데이터 스페이스를 16진수 값을 가지는 키 스페이스로 변환할 수 있다.The arithmetic mapping unit may convert the data space into a key space having a hexadecimal value.

본 발명의 제 2 관점으로서 구조적 Ｐ2Ｐ 네트워크를 위한 데이터 매핑 방법은, 데이터 스페이스를 입력받는 단계와, 산술 매핑 기법을 사용해서 상기 데이터 스페이스를 키 스페이스로 변환하는 단계와, 변환된 상기 키 스페이스를 구조적 P2P 네트워크의 피어 노드들에게 배분하는 단계를 포함할 수 있다.According to a second aspect of the present invention, a data mapping method for a structured P2P network includes receiving a data space, converting the data space into a key space using an arithmetic mapping technique, and converting the converted key space into a structure. And allocating to peer nodes of the P2P network.

여기서, 상기 변환하는 단계는, 상기 데이터 스페이스를 숫자 값을 가지는 키 스페이스로 변환할 수 있다.In the converting, the data space may be converted into a key space having a numeric value.

상기 변환하는 단계는, 상기 데이터 스페이스를 16진수 값을 가지는 키 스페이스로 변환할 수 있다.
In the converting, the data space may be converted into a key space having a hexadecimal value.

본 발명의 실시예에 의하면, 사전적 순서를 파괴하지 않으므로 복잡한 쿼리를 지원할 수 있음은 물론이고, 심볼들의 출현 빈도 확률을 알고 있는 경우에는 데이터가 균등하지 않은 경우에도 데이터를 키로 균일하게 매핑하므로 별도의 로드밸런싱을 하지 않거나 그 오버헤드를 최소화 할 수 있는 효과가 있다.
According to the embodiment of the present invention, since it does not destroy the dictionary order, it is possible to support a complex query, and when the probability of occurrence of symbols is known, the data is uniformly mapped to the key even when the data are not uniform. It does not have load balancing or minimizes the overhead.

도 1은 종래 기술에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치의 구성도.
도 2는 본 발명의 실시예에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치의 구성도.
도 3은 본 발명의 실시예에 따른 산술 매핑의 이해를 돕기 위한 예시도.
도 4는 본 발명의 실시예에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 방법을 설명하기 위해 산술 매핑 기법에 대한 흐름도.
도 5는 종래 기술에 따른 Raw 매핑과 랜덤 해쉬 매핑 및 본 발명의 실시예에 따른 산술 매핑의 비교 그래프.1 is a block diagram of a data mapping apparatus for a structured P2P network according to the prior art.
2 is a block diagram of a data mapping apparatus for a structured P2P network according to an embodiment of the present invention.
3 is an exemplary diagram to help understand arithmetic mapping according to an embodiment of the present invention.
4 is a flow chart of an arithmetic mapping technique for explaining a data mapping method for a structured P2P network according to an embodiment of the present invention.
5 is a comparison graph of raw mapping and random hash mapping according to the prior art and arithmetic mapping according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various different forms, and only the embodiments make the disclosure of the present invention complete, and the general knowledge in the art to which the present invention belongs. It is provided to fully inform the person having the scope of the invention, which is defined only by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In describing the embodiments of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the embodiments of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be based on the contents throughout this specification.

첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다. Combinations of each block of the accompanying block diagram and each step of the flowchart may be performed by computer program instructions. These computer program instructions may be mounted on a processor of a general purpose computer, special purpose computer, or other programmable data processing equipment such that instructions executed through the processor of the computer or other programmable data processing equipment may not be included in each block or flowchart of the block diagram. It will create means for performing the functions described in each step. These computer program instructions may be stored in a computer usable or computer readable memory that can be directed to a computer or other programmable data processing equipment to implement functionality in a particular manner, and thus the computer usable or computer readable memory. It is also possible for the instructions stored in to produce an article of manufacture containing instruction means for performing the functions described in each block or flowchart of each step of the block diagram. Computer program instructions may also be mounted on a computer or other programmable data processing equipment, such that a series of operating steps may be performed on the computer or other programmable data processing equipment to create a computer-implemented process to create a computer or other programmable data. Instructions that perform processing equipment may also provide steps for performing the functions described in each block of the block diagram and in each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.
Also, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing the specified logical function (s). It should also be noted that in some alternative embodiments, the functions mentioned in blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order according to the corresponding function.

도 2는 본 발명의 실시예에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치의 구성도이다.2 is a block diagram of a data mapping apparatus for a structured P2P network according to an embodiment of the present invention.

이에 나타낸 바와 같이 본 발명의 실시예에 따른 데이터 매핑 장치는, 데이터 스페이스를 입력받는 데이터 입력부(110), 산술 매핑(arithmetic mapping) 기법을 사용해서 데이터 스페이스를 특정한 길이의 키 스페이스로 변환하는 산술 매핑부(120), 산술 매핑부(120)에 의해 변환된 키 스페이스를 구조적 P2P 네트워크(140)의 피어 노드들(141, 143, 145)에게 배분하는 키 배분부(130) 등을 포함한다.As described above, the data mapping apparatus according to the embodiment of the present invention uses a data input unit 110 for receiving a data space and an arithmetic mapping technique to convert the data space into a key space having a specific length. And a key distribution unit 130 for distributing the key space converted by the arithmetic mapping unit 120 to the peer nodes 141, 143, and 145 of the structural P2P network 140.

이러한 본 발명의 실시예에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 장치에 의하면, 산술 매핑부(120)가 데이터 스페이스를 키 스페이스로 매핑함에 있어 산술 매핑 기법을 사용하며, 이로써 사전적 순서를 파괴하지 않고 매핑하는 동시에 데이터 스페이스에 대한 통계적 모델의 정확도에 따라 종래의 랜덤 해쉬 기법과 유사한 로드밸런싱 효과를 갖는다. 여기서, 도 2에서는 산술 매핑부(120)가 데이터를 16진수 값을 가지는 키로 변환하는 경우를 예시하였으나, 이는 데이터 스페이스를 숫자 값(numeric value)을 가지는 키 스페이스로 변환하는 실시예에 불과한 것이다.According to the data mapping apparatus for the structured P2P network according to the embodiment of the present invention, the arithmetic mapping unit 120 uses an arithmetic mapping technique in mapping data spaces to key spaces, without breaking the dictionary order. At the same time, the load balancing effect is similar to that of the conventional random hash method, depending on the accuracy of the statistical model for the data space. 2 illustrates an example in which the arithmetic mapping unit 120 converts data into a key having a hexadecimal value, but this is merely an embodiment of converting a data space into a key space having a numeric value.

도 3에는 산술 매핑 기법의 예를 나타내었다.3 shows an example of an arithmetic mapping technique.

A, B, C, eos(end_of_sequence)라는 4개의 심볼(symbol)이 있다고 가정하고, A의 발생 확률이 60%, B의 발생 확률이 20%, C의 발생 확률이 10%, eos의 발생 확률이 10%라고 하자. 만약 입력된 스트링(string)이 ac + eos라면 도 3과 같이 매핑되어 0.534 내지 0.54 사이의 값으로 매핑된다.Assuming that there are four symbols A, B, C, and eos (end_of_sequence), the probability of occurrence of A is 60%, the probability of occurrence of B is 20%, the probability of occurrence of C is 10%, and the probability of occurrence of eos Let's say 10%. If the input string is ac + eos, it is mapped as shown in FIG. 3 and mapped to a value between 0.534 and 0.54.

산술 매핑에서 사용되는 확률 모델은 FCM(Finite Context Modeling)이다. 이 모델링은 키로 매핑되어야 하는 심볼들이 나타나는 횟수에 기초해서 확률로 나타내는 것이다. 즉 그 심볼이 몇 번 나타났는가에 대한 컨텍스트가 만들어지는 것이다.The probabilistic model used in arithmetic mapping is finite context modeling (FCM). This modeling is expressed as a probability based on the number of times the symbols appear to be mapped to a key. In other words, a context is created for how many times the symbol appeared.

또한 모델의 오더(order)라는 것이 있는데 컨텍스트를 만드는데 몇 개의 심볼들을 연속해서 생각하는가에 의미한다. 즉 오더-0 모델은 각 심볼들의 확률을 독립적으로 계산하는 것이다. 그리고, 예를 들어 오더-0에서는 "u"라는 심볼이 나타날 확률이 5%로 계산되었다고 할 때에 오더-1에서는 "q"라는 심볼이 나온 후에 "u"라는 심볼이 나올 확률이 95%로 계산될 수 있다. 왜냐하면 영어의 특성상 "q"다음에는 "u"가 많이 나온다. 이런 식으로 오더가 높으면 좀 더 정확하게 예측할 수 있다. 하지만 오더가 선형적으로 증가하면 필요해지는 메모리는 지수적으로 증가하기 때문에 오더가 너무 높은 것은 바람직하지 않다.There is also an order for the model, which means how many symbols in succession to create the context. In other words, the order-0 model calculates the probability of each symbol independently. For example, in order-0, when the probability of the symbol "u" appears to be calculated at 5%, in order-1, the probability of the symbol "u" appears after the symbol "q" is calculated as 95%. Can be. Because of the nature of English, "q" is followed by "u" a lot. In this way, higher orders can be more accurately predicted. However, if the order increases linearly, the memory required increases exponentially, so it is not desirable to order too high.

도 4에는 본 발명의 실시예에 따른 구조적 P2P 네트워크를 위한 데이터 매핑 방법을 설명하기 위해 산술 매핑 기법에 대해 개념적으로 나타내었다.4 conceptually illustrates an arithmetic mapping technique to explain a data mapping method for a structured P2P network according to an embodiment of the present invention.

이하의 설명에서 N은 심볼의 수이며, N+1번째 심볼은 eos 심볼이다. X는 시퀀스이고, ε은 eos 심볼이다.In the following description, N is the number of symbols, and the N + 1th symbol is an eos symbol. X is a sequence and ε is an eos symbol.

먼저, 제 1 단계에서는 인터벌(interval) [0,1)로 시작한다.First, the first step begins with an interval [0, 1].

제 2 단계에서는 인터벌이 N+1개의 레벨-1(level-1) 세그먼트(segment)로 분리된다. 여기서, K=1, …, N일 때에, (k+1)번째 세그먼트의 길이는 심볼 k(S_k)의 확률에 비례한다. 예컨대, 그 전에 심볼이 나타나지 않았을 경우에 (k+1)번째 세그먼트의 길이는 심볼 S_k가 시퀀스 X에서 나올 확률과 같다.In the second step, the interval is divided into N + 1 level-1 segments. Where K = 1,... , When N, the length of the (k + 1) th segment is proportional to the probability of the symbol k (S _k ). For example, if the symbol does not appear before, the length of the (k + 1) -th segment is equal to the probability that the symbol S _{k emerges} from the sequence X.

제 3 단계에서는 시퀀스 X가 X₁=S_k라면 (k+1)번째 서브 인터벌로 매핑되며, X₁=ε이면 첫 번째 서브 인터벌에 매핑된다.In the third step, when the sequence X is X ₁ = S _k , it is mapped to the (k + 1) th subinterval, and when X ₁ = ε, it is mapped to the first sub interval.

다음으로, 제 4 단계에서는 시퀀스 X에 처리할 심볼이 남아 있다면 제 2 단계로 돌아가며, 레벨-1 인터벌을 길이가 Pr(X₂=S_k|X₁)에 비례하는 레벨-2 인터벌로 나눈다.Next, in the fourth step, if there are remaining symbols to be processed in the sequence X, the process returns to the second step, and the level-1 interval is divided by the level-2 interval whose length is proportional to Pr (X ₂ = S _k | X ₁ ).

위와 같은 제 2 단계 내지 제 4 단계를 시퀀스 X의 모든 심볼들이 처리될 때까지 반복한다.The second to fourth steps above are repeated until all the symbols of sequence X have been processed.

도 4를 통해 알 수 있듯이 본 발명의 실시예에 따른 산술 매핑은 그 간격은 일정하지 않고 특정 심볼의 출현 확률에 비례한다.As can be seen from FIG. 4, in the arithmetic mapping according to the embodiment of the present invention, the interval is not constant and is proportional to the probability of occurrence of a specific symbol.

도 5는 종래 기술에 따른 Raw 매핑과 랜덤 해쉬 매핑 및 본 발명의 실시예에 따른 산술 매핑(Arithmetic Mapping, AM)을 각각 사용하여 비균등(skewed) 데이터를 매핑했을 때 균등한 정도를 비교한 그래프이다. y축은 X ², x축은 bin 즉 키 스페이스의 간격을 몇 개로 나눴는가를 말한다. AM-2, AM-3, AM-4, AM-5에서 숫자는 앞에서 설명한 FCM의 오더 값을 말한다. 도 5의 그래프를 통해 알 수 있듯이 본 발명의 실시예에 따른 산술 매핑은 AM-3부터 랜덤 해쉬와 비슷한 성능을 보여준다.
FIG. 5 is a graph comparing uniformity when mapping skewed data using raw mapping according to the prior art, random hash mapping, and arithmetic mapping according to an embodiment of the present invention, respectively. to be. The y-axis is X ² and the x-axis is the bin, which is the number of key spaces divided. In AM-2, AM-3, AM-4, and AM-5, the numbers refer to the order values of the FCM described above. As can be seen through the graph of FIG. 5, the arithmetic mapping according to the embodiment of the present invention shows a performance similar to that of the random hash from AM-3.

110 : 데이터 입력부 120 : 산술 매핑부
130 : 키 배분부 140 : 구조적 P2P 네트워크
141, 143, 145 : 피어 노드110: data input unit 120: arithmetic mapping unit
130: key allocation 140: structured P2P network
141, 143, 145: peer node

Claims

A data input unit for receiving a data space;
An arithmetic mapping unit for converting the data space into a key space using an arithmetic mapping technique;
A key distribution unit for allocating the key space converted by the arithmetic mapping unit to peer nodes of a structural P2P network,
In the arithmetic mapping technique, when N is the number of symbols, the N + 1 th symbol is an eos symbol, X is a sequence, and ε is an eos symbol.
A first step of setting an interval,
A second step in which the interval is divided into N + 1 level-1 segments, and (where K = 1, ..., N, the length of the (k + 1) th segment is Proportional to the probability of the symbol k (S _k ).)
If the sequence X is X ₁ = S _k , the third step is mapped to the (k + 1) th subinterval, and if X ₁ = ε, the third step is mapped to the first sub interval,
A fourth step of dividing a level-1 interval into a level-2 interval whose length is proportional to Pr (X ₂ = S _k | X ₁ ), if there are remaining symbols to be processed in the sequence X;
A fifth step of repeating the second to fourth steps until all the symbols of the sequence X have been processed;
Data mapping device for structured P2P network.

The method of claim 1,
The arithmetic mapping unit converts the data space into a key space having a numeric value.
Data mapping device for structured P2P network.

The method of claim 2,
The arithmetic mapping unit converts the data space into a key space having a hexadecimal value.
Data mapping device for structured P2P network.

A data mapping method by a data mapping apparatus for a structured P2P network,
Receiving a data space,
Converting the data space to a key space using an arithmetic mapping technique,
Allocating the converted key space to peer nodes of a structured P2P network;
In the arithmetic mapping technique, when N is the number of symbols, the N + 1 th symbol is an eos symbol, X is a sequence, and ε is an eos symbol.
A first step of setting an interval,
A second step in which the interval is divided into N + 1 level-1 segments, and (where K = 1, ..., N, the length of the (k + 1) th segment is Proportional to the probability of the symbol k (S _k ).)
If the sequence X is X ₁ = S _k , the third step is mapped to the (k + 1) th subinterval, and if X ₁ = ε, the third step is mapped to the first sub interval,
A fourth step of dividing the level-1 interval into a level-2 interval whose length is proportional to Pr (X ₂ = S _k | X ₁ ), if there are remaining symbols to be processed in the sequence X;
A fifth step of repeating the second to fourth steps until all the symbols of the sequence X have been processed;
Data mapping method for structured P2P network.

The method of claim 4, wherein
The converting may include converting the data space into a key space having a numeric value.
Data mapping method for structured P2P network.

The method of claim 5, wherein
The converting may include converting the data space into a key space having a hexadecimal value.
Data mapping method for structured P2P network.