KR101272877B1

KR101272877B1 - Apparatus and method for seperating partition in distributed file system

Info

Publication number: KR101272877B1
Application number: KR1020090128432A
Authority: KR
Inventors: 박경현
Original assignee: 한국전자통신연구원
Priority date: 2009-12-21
Filing date: 2009-12-21
Publication date: 2013-06-11
Also published as: KR20110071783A

Abstract

대용량의 데이터를 각 파티션으로 분산하여 관리하는 대용량 파일 분산 관리 시스템에서 파티션을 분할하기 위한 기술을 제공한다. 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치는, 데이터를 분산하여 저장하는 파티션으로의 접속 빈도를 제1 주기마다 측정하는 빈도수 측정부와, 제1 주기의 정수배인 제2 주기마다 제2 주기에 상응하는 접속 빈도들을 이용하여 제1 로드를 생성하는 로드 연산부와, 제2 주기의 정수배인 제3 주기마다 제3 주기에 상응하는 제1 로드들을 연산한 제2 로드에 근거하여 파티션 분할 여부를 제어하는 분할 제어부를 포함하는 것을 특징으로 한다.It provides a technique for partitioning in a large file distribution management system that distributes and manages a large amount of data to each partition. According to an embodiment of the present invention, a partitioning apparatus of a distributed file system includes a frequency measuring unit configured to measure a frequency of connection to a partition for distributing and storing data every first period, and a second frequency interval every second period that is an integer multiple of the first period. Partitioning the partition based on a load calculation unit that generates a first load by using connection frequencies corresponding to two cycles, and a second load that calculates first loads corresponding to a third period for every third period that is an integer multiple of the second period. Characterized in that it comprises a split control unit for controlling whether or not.

Description

Partitioning device and method for distributed file system {APPARATUS AND METHOD FOR SEPERATING PARTITION IN DISTRIBUTED FILE SYSTEM}

본 발명은 대용량의 파일을 관리하는 분산 데이터 시스템에서 데이터를 관리하는 파티션을 효과적으로 분할하기 위한 방법에 관한 기술이다. 더욱 자세하게는, 파티션에 대한 사용자들의 접속 빈도에 근거하여 접속 빈도가 높은 파티션을 분할하고, 파티션 분할 시에도 접속 빈도를 반영하여 분할할 위치를 판단하는 기술에 관한 것이다.The present invention relates to a method for effectively partitioning a partition for managing data in a distributed data system for managing a large file. More specifically, the present invention relates to a technique of dividing a partition having a high access frequency based on a user's access frequency to a partition, and determining a position to divide by reflecting the access frequency even when partitioning.

유무선 통신 기술의 발달, 및 컴퓨터 관련 기술의 발달에 따라 데이터를 효과적으로 관리하는 기술에 관한 연구가 이루어지고 있다. 사용자가 제작한 데이터, 예를 들어 UCC, 사용자 중심 어플리케이션 등의 등장으로 인해 한번에 관리해야 하는 데이터의 양 또한 급속도로 늘어나고 있는 추세이다.With the development of wired and wireless communication technologies and the development of computer-related technologies, researches have been conducted on technologies for effectively managing data. With the advent of user-generated data such as UCC and user-centric applications, the amount of data to be managed at once is also increasing rapidly.

또한, 멀티미디어 데이터의 고용량화와, 컴퓨터 처리 속도의 발전에 따라서, 개개의 데이터의 크기 역시 매우 커지고 있다. 따라서, 크기와 양에 있어서 모두 급속도로 그 총량이 늘어나고 있는 데이터 관리 기술이 절실하게 필요하다.In addition, with the increase in the capacity of multimedia data and the development of computer processing speed, the size of individual data is also very large. Therefore, there is an urgent need for data management techniques that are rapidly increasing in both size and quantity.

대용량의 데이터를 관리하기 위한 시스템으로 분산 대용량 데이터 시스템이 존재한다. 분산 대용량 데이터 시스템은 마스터 서버와 다수의 파티션 서버로 구성된다. 마스터 서버는 파티션 서버들을 관리하고 데이터가 속한 파티션 서버의 위치 등을 관리하게 된다. 파티션 서버는 실제 데이터가 속한 파티션을 관리하는 서버고, 데이터는 키를 기반으로 순차적으로 정렬되어 관리된다.Distributed large data system exists as a system for managing large data. Distributed large data systems consist of a master server and multiple partition servers. The master server manages partition servers and manages the location of partition servers to which data belongs. The partition server manages the partition to which the actual data belongs, and the data is arranged and managed sequentially based on the key.

사용자 측에서 데이터 검색 시, 마스터 서버가 먼저 어느 파티션 서버에 데이터가 존재하는지에 대한 정보를 제공하며, 사용자는 제공된 정보에 근거하여 직접 파티션 서버에 접속하고, 데이터를 사용하게 된다. 이러한 경우, 다수의 클라이언트가 특정 파티션 또는 특정 데이터에 대해 접근할 경우, 특정 파티션을 관리해야하는 파티션 서버에는 큰 부하가 걸리게 된다. 따라서 이러한 경우, 파티션을 분할하여 다른 가용한 파티션 서버에 할당함으로써, 파티션 서버에서 부하가 초과되는 것을 방지한다.When retrieving data from the user side, the master server first provides information on which partition server exists, and the user directly accesses the partition server based on the provided information and uses the data. In this case, when a large number of clients access a specific partition or a specific data, the partition server that needs to manage a specific partition is heavily loaded. Thus, in this case, partitions are partitioned and allocated to other available partition servers, thereby avoiding overloading the partition servers.

그러나, 접속 빈도수가 매우 높으며 크기 또한 매우 큰 고용량의 데이터의 경우에는, 파티션을 분할한다고 하더라도 기존의 파티션에 존재하는 데이터의 부분이 존재하게 된다. 따라서, 파티션을 일괄적으로 분할하는 기능만으로는 접속 빈도수가 높은 파티션으로의 접속 분배를 효과적으로 수행하기 어려우며, 이에 따라서, 능동적으로 파티션의 분할을 제어하고, 파티션 분할에 있어서 접속 빈도를 반영한 기술의 필요성이 늘어나고 있다.However, in the case of high-capacity data having a very high frequency of connection and a very large size, even when partitioning, there is a part of data existing in the existing partition. Therefore, it is difficult to effectively distribute a connection to a partition having a high frequency of access only by the function of dividing the partition collectively. Accordingly, there is a need for a technique that actively controls the partitioning and reflects the access frequency in partitioning. Growing.

상기 문제점 및 필요성에 대응하여, 본 발명은 효과적으로 파티션을 분할하는 것을 제어하고 파티션 분할 시에도 각 파티션으로의 접속 빈도를 효과적으로 제어할 수 있는 기술을 제공하는 데 그 목적이 있다.In order to cope with the above problems and needs, an object of the present invention is to provide a technique capable of effectively controlling partitioning and controlling the frequency of access to each partition even when partitioning.

상기 언급한 목적을 달성하기 위하여, 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치는, 데이터를 분산하여 저장하는 파티션으로의 접속 빈도를 제1 주기마다 측정하는 빈도수 측정부와, 제1 주기의 정수배인 제2 주기마다 제2 주기에 상응하는 접속 빈도들을 이용하여 제1 로드를 생성하는 로드 연산부와, 제2 주기의 정수배인 제3 주기마다 제3 주기에 상응하는 제1 로드들을 연산한 제2 로드에 근거하여 파티션 분할 여부를 제어하는 분할 제어부를 포함하는 것을 특징으로 한다.In order to achieve the above-mentioned object, a partition partitioning apparatus of a distributed file system according to an embodiment of the present invention, the frequency measuring unit for measuring the frequency of connection to the partition for distributing and storing data for each first period, and the first A load calculation unit generates a first load by using connection frequencies corresponding to a second period for every second period that is an integer multiple of the period, and calculates first loads that correspond to a third period for each third period that is an integer multiple of the second period. And a partition control unit for controlling whether to partition the partition based on the second load.

본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 방법은, 빈도수 측정부가 제1 주기마다 데이터를 분산하여 저장하는 파티션으로의 접속 빈도를 측정하는 단계; 로드 연산부가 제1 주기의 정수배인 제2 주기마다 제2 주기에 상응하는 접속 빈도들을 이용하여 제1 로드를 생성하는 단계; 및 분할 제어부가 제2 주기의 정수배인 제3 주기마다 제3 주기에 상응하는 제1 로드들을 연산한 제2 로드에 근거하여 파티션 분할 여부를 제어하는 단계를 포함하는 것을 특징으로 한다.According to an embodiment of the present invention, a partitioning method of a distributed file system may include: measuring, by a frequency measuring unit, a connection frequency to a partition that stores and distributes data every first period; Generating, by the load computing unit, the first load by using connection frequencies corresponding to the second period for every second period that is an integer multiple of the first period; And controlling, by the division controller, whether to partition the partition based on a second load that calculates first loads corresponding to the third period for every third period that is an integer multiple of the second period.

또한, 분할 제어부는, 분할 대상 파티션의 분할 시, 파티션에 포함된 데이터 요소별 빈도수에 근거한 로드를 제3 주기를 기준으로 계산하고, 상기 계산된 로드에 근거하여 파티션을 분할할 위치를 판단하는 기능을 더 포함할 수 있다.In addition, the partition control unit may calculate a load based on the frequency of each data element included in the partition based on the third period when the partition target partition is divided, and determine a position to partition the partition based on the calculated load. It may further include.

본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치 및 방법에 의하면, 사용자의 접속 빈도에 따라서 분할될 파티션을 선택한다. 따라서, 일괄적으로 파티션을 분할하지 않기 때문에 효율적인 파티션 분할을 실시할 수 있어 특정 파티션 서버의 로드를 분배할 수 있는 효과가 있다. 또한, 파티션 분할 시에도 각 데이터 요소별 로드를 계산하여 분할하기 때문에, 고용량 고 접속 빈도의 데이터에 대한 접속의 분배를 효과적으로 실시할 수 있는 효과가 있다.According to the partitioning apparatus and method of the distributed file system according to an embodiment of the present invention, a partition to be partitioned is selected according to a user's access frequency. Therefore, since partitions are not collectively partitioned, efficient partitioning can be performed, and the load of a specific partition server can be distributed. In addition, since the load for each data element is calculated and divided at the time of partition partitioning, there is an effect of effectively distributing the connection to the data of high capacity and high connection frequency.

이하 도 1을 참조하여 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치에 대하여 설명하기로 한다.Hereinafter, a partition partitioning apparatus of a distributed file system according to an exemplary embodiment of the present invention will be described.

도 1은 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치(100)에 대한 블록도이다.1 is a block diagram of a partitioning apparatus 100 of a distributed file system according to an exemplary embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치(100)는, 빈도수 측정부(101), 로드 연산부(103), 및 분할 제어부(102)를 포함한다.Referring to FIG. 1, a partition partitioning apparatus 100 of a distributed file system according to an exemplary embodiment of the present invention includes a frequency measuring unit 101, a load calculating unit 103, and a splitting control unit 102.

본 발명의 실시 예에서 대용량의 데이터는 키 값을 기준으로 정렬하여 하나의 논리적 테이블에 저장된다. 테이블은 데이터가 증가함에 따라서 파티션 단위로 분할된다. 파티션(P1, P2 내지 Pn, 이하에서는 특정 파티션을 지정하지 않는 한 P1 으로 대표하여 식별하기로 한다.)은 테이블의 데이터를 수평으로 분할한 서브 집합으로, 접속 분산 및 로드 측정의 단위가 된다.In an embodiment of the present invention, a large amount of data is stored in one logical table by sorting by key value. The table is divided into partitions as the data grows. The partitions P1, P2 to Pn, hereinafter, will be identified by being represented by P1 unless a specific partition is designated, are a subset of horizontally partitioned data of the table, and are units of connection distribution and load measurement.

빈도수 측정부(101)는, 제1 주기(예를 들어 1초)마다 파티션(P1)의 접속 빈도수를 측정한다. 접속 빈도는 각 파티션의 주파수(Frequency)라고도 한다. 빈도수 측정부(101)는 파티션 서버(104)에 네트워크(106)를 통해 연결되어 있다.The frequency measuring unit 101 measures the connection frequency of the partition P1 every first period (for example, 1 second). The access frequency is also called the frequency of each partition. The frequency measuring unit 101 is connected to the partition server 104 via the network 106.

빈도수 측정부(101)에서 파티션(P1)의 접속 빈도수는 모든 파티션 서버에 포함된 파티션들을 대상으로 측정된다. 따라서, 빈도수 측정부(101)는 모든 파티션 서버에 포함된 파티션들의 접속 빈도를 측정할 수 있는 병렬 방식의 접속 빈도 측정이 사용될 것이다.In the frequency measuring unit 101, the connection frequency of the partition P1 is measured for partitions included in all partition servers. Therefore, the frequency measuring unit 101 may be used in parallel connection frequency measurement that can measure the connection frequency of the partitions included in all partition servers.

로드 연산부(103)는, 제1 주기의 정수배(예를 들어 4배)인 제2 주기(예를 들어 4초)마다 제2 주기에 상응하는 접속 빈도들을 이용하여 제1 로드를 생성하는 기능을 수행한다.The load calculator 103 may generate a first load by using connection frequencies corresponding to the second period every second period (for example, four seconds) that is an integer multiple (for example, four times) of the first period. To perform.

접속 빈도는 시간에 따라서 변할 수 있고, 단순히 접속 빈도에만 근거하여 파티션을 분할 시에는 대용량의 데이터에 대한 파티션 분할 목적을 제대로 달성할 수 없다. 파티션 분할 및 관리에 있어서 대용량의 데이터에 대해서 분할될 파티션을 효과적으로 선택할 수 없기 때문이다. 따라서, 접속 빈도의 선형 함수값인 제1 로드를 계산하여 분할될 파티션 선택에 사용하는 것이다.The access frequency may change over time, and when partitioning based only on the access frequency, the partitioning purpose for a large amount of data may not be properly achieved. This is because partition partitioning and management cannot effectively select a partition to be partitioned for a large amount of data. Therefore, the first load, which is a linear function of the connection frequency, is calculated and used for selecting partitions to be partitioned.

로드 연산부(103)는, 제1 로드를 계산시에 제1 주기의 정수배인 제2 주기마다 제2 주기에 상응하는 접속 빈도들을 이용한다. 제2 주기에 상응한다는 것은, 제2 주기에 포함되는 복수의 제1 주기들에서 측정된 접속 빈도들을 이용한다는 의미 이다.The load calculator 103 uses connection frequencies corresponding to the second period for every second period that is an integer multiple of the first period when calculating the first load. Corresponding to the second period means using the access frequencies measured in the plurality of first periods included in the second period.

그러나, 만약 복수의 제1 주기에 측정된 접속 빈도를 동일하게 취급한다면, 현재 파티션의 접속 빈도 또는 로드는 이전에 발생한 접속 빈도 또는 로드에도 영향을 받기 때문에 효과적인 로드 계산이 되지 않는다. 따라서, 시간 단위의 접속 빈도의 단순 평균을 이용한다면, 어느 파티션을 분할해야 할지를 결정하는 데 효과적이지 못한 점이 있다.However, if the connection frequency measured in the plurality of first periods is treated the same, the connection frequency or load of the current partition is also affected by the previously generated connection frequency or load, so that no effective load calculation is made. Therefore, using a simple average of the access frequency in units of time is not effective in determining which partition to partition.

따라서 로드 연산부(103)는, 제1 로드를 계산 시, 시간 단위별로 가중치를 부여하여 제1 로드를 계산하게 된다. 즉, 제1 주기별로 가중치를 부여하고, 각 제1 주기에 대응하는 각 접속 빈도들에 가중치를 반영하여 제1 로드를 연산하게 되는 것이다. 제1 로드를 계산하는 데는 다음의 식이 사용된다.Therefore, when calculating the first load, the load calculation unit 103 calculates the first load by assigning a weight to each unit of time. That is, the first load is calculated by applying a weight to each first period and reflecting the weight on each connection frequency corresponding to each first period. The following equation is used to calculate the first load.

L1 = (aFt1 + bFt2 + cFt3 + dFt4)/4L1 = (aFt1 + bFt2 + cFt3 + dFt4) / 4

본 발명의 실시 예에서는 4번의 제1 주기에 대한 접속 빈도를 연산하게 된다. L1은 파티션의 제1 로드를 의미하며, Ft1, Ft2, Ft3, Ft4는 각 제1 주기별로 측정된 접속 빈도를 시간순으로 배열한 것이다. 각 접속 빈도들에 곱해지는 계수인 a, b, c, d는 각 접속 빈도들이 측정된 제1 주기별로 시간 순서에 따라 가중치를 부여하기 위해 곱해지는 변수이다. 본 발명의 실시 예에서 a = 1, b = 2, c = 3, d = 4로 설정되어 있다. 그러나, 시간순으로 접속 빈도에 대해서 가중치를 부여할 수 있는 연산 방법이라면 어느 것이나 가능할 것이다.In an embodiment of the present invention, the access frequency for four first periods is calculated. L1 denotes a first load of the partition, and Ft1, Ft2, Ft3, and Ft4 are arranged in chronological order of the connection frequencies measured for each first period. The coefficients a, b, c, and d, which are multiplied by the respective access frequencies, are variables that are multiplied so as to give a weight in time order for each measured first period. In an embodiment of the present invention, a = 1, b = 2, c = 3, and d = 4. However, any calculation method that can weight the connection frequency in chronological order would be possible.

분할 제어부(102)는 로드 연산부(103)에 의해 연산된 제1 로드를 다시 한번 연산한 제2 로드를 생성하여 분할될 파티션을 결정하는 기능을 수행한다. 제2 주기의 정수배(예를 들어 4배)인 제3 주기(예를 들어 16초)마다 제3 주기에 상응하는 제1 로드들을 연산한 제2 로드에 근거하여 파티션 분할 여부를 제어한다. 제3 주기에 상응하는 제1 로드들은, 제3 주기에 포함된 정수개의 제2 주기들마다 연산된 복수의 제1 로드들을 의미한다.The division control unit 102 generates a second load in which the first load calculated by the load operation unit 103 is calculated once again, and determines a partition to be divided. The partitioning is controlled based on a second load obtained by calculating first loads corresponding to the third period every third period (for example, 16 seconds) that is an integer multiple (for example, four times) of the second period. The first loads corresponding to the third periods refer to a plurality of first loads calculated for every second periods included in the third period.

2단계로 파티션에 가해지는 로드값을 계산하는 것은, 파티션의 분할이 필요한 경우와 필요하지 않은 경우, 즉 파티션 분할의 기준을 잡기 위한 임계치를 정하기가 상대적으로 쉽기 때문이다. 또한, 대용량의 파일에 대한 접근에 대해서 효과적으로 파티션을 분할하기 위하여, 정확하게 분할될 파티션을 정하기 위하여서도 2단계의 로드값 연산이 필요해진다.Calculating the load value applied to the partition in two steps is because it is relatively easy to determine the threshold for partition partitioning when and when it is not necessary, i.e. In addition, in order to effectively partition a partition for access to a large file, a two-step load value calculation is required to determine a partition to be partitioned correctly.

로드 연산부(103)에서의 연산과 마찬가지로, 만약 복수의 제2 주기에 측정된 제1 로드들을 동일하게 취급한다면, 현재 파티션의 로드는 이전에 발생한 로드에도 영향을 받기 때문에 효과적인 로드 계산이 되지 않는다. 따라서, 시간 단위의 제1 로드의 단순 평균을 이용한다면, 어느 파티션을 분할해야 할지를 결정하는 데 효과적이지 못한 점이 있다.Similar to the calculation in the load calculating section 103, if the first loads measured in the plurality of second periods are treated the same, the load of the current partition is affected by the load that occurred previously, so that the effective load calculation is not performed. Thus, using a simple average of the first load in units of time is not effective in determining which partition to partition.

따라서 분할 제어부(102)는, 제2 로드를 계산 시, 시간 단위별로 가중치를 부여하여 제1 로드를 연산하여 제2 로드를 생성하게 된다. 즉, 제2 주기별로 가중치를 부여하고, 각 제2 주기에 대응하는 각 로드들에 가중치를 반영하여 제2 로드 를 연산하게 되는 것이다. 제2 로드를 계산하는 데는 다음의 식이 사용된다.Therefore, when calculating the second load, the division controller 102 calculates the first load by assigning a weight to each unit of time to generate the second load. That is, the second load is calculated by applying a weight to each second period and reflecting the weight on each load corresponding to each second period. The following equation is used to calculate the second load.

L2 = (aL1T1 + bL1T2 + cL1T3 + dL1T4)/4L2 = (aL1T1 + bL1T2 + cL1T3 + dL1T4) / 4

수학식 2에서 L2는 측정 대상 파티션의 제3 주기동안의 제2 로드를 의미한다. L1T1 내지 L1T4는 각 제2 주기마다의 제1 로드를 의미한다. 각 제2 주기에 대응하여 곱해지는 계수인 a, b, c, d는 상기 언급한 수학식 1의 계수와 같이 시간대별 가중치를 의미한다. 역시 본 발명의 실시 예에서 4개의 제2 주기를 하나의 제3 주기로 설정하였으므로, 제1 주기는 총 4개가 연산에 이용된다. 계수 a, b, c, d는 상기 수학식 1의 계수와 같이 각각 1, 2, 3, 4이다.In Equation 2, L2 means a second load during the third period of the measurement target partition. L1T1 to L1T4 mean a first load for each second period. The coefficients a, b, c, and d, which are multiplied corresponding to each second period, represent weights for each time zone, as in the above-described coefficients of Equation 1 above. In the embodiment of the present invention, since four second periods are set as one third period, a total of four first periods are used for the calculation. The coefficients a, b, c, and d are 1, 2, 3, and 4, respectively, as in the coefficient of Equation 1 above.

분할 제어부(102)는 연산된 제2 로드에 근거하여 파티션 분할 여부를 결정하게 된다. 제2 로드가 임계값을 초과할 경우에는 분할할 파티션으로 판단하게 된다. 임계값은, 접속 빈도의 단위 및 계수 a, b, c, d에 따라서 달라질 수 있어 설명을 생략하기로 한다.The partition controller 102 determines whether to partition the partition based on the calculated second load. If the second load exceeds the threshold, it is determined as a partition to be divided. The threshold value may vary depending on the unit of the access frequency and the coefficients a, b, c, and d, and description thereof will be omitted.

접속 빈도 및 접속 빈도를 소정 주기 단위로 연산한 로드에 근거하여 분할할 파티션을 결정하기 때문에, 일괄적으로 파티션을 분할하는 종래의 기술과 달리 정확하고 효율적인 파티션 분할이 가능해진다. 따라서, 파티션 서버로의 부하 증가로 인한 성능 저하 역시 효과적으로 해소할 수 있는 효과가 있을 것이다.Since the partition to be divided is determined based on the connection frequency and the load calculated on the basis of a predetermined cycle, it is possible to accurately and efficiently partition the partition, unlike the conventional technique of partitioning the partition at once. Therefore, the performance degradation due to the increased load on the partition server will also be effective to effectively resolve.

본 발명의 실시 예에서 분할 제어부(102)는 상기 언급한 제2 로드에 근거하 여 선택된 파티션을 분할하기 위한 기능을 더 포함할 수 있다. 즉, 파티션을 이루고 있는 데이터 테이블 중 어느 위치를 기준으로 파티션을 분할할지에 대해서 판단하는 기능을 더 포함할 수 있는 것이다.In an embodiment of the present disclosure, the partition controller 102 may further include a function for partitioning the partition selected based on the above-mentioned second load. That is, the method may further include a function of determining which partition to partition the data table of which the partition is based.

분할 제어부(102)는, 제3 주기를 기준으로 파티션 내의 데이터 요소에 대한 총 접근 빈도를 계산하게 된다. 따라서, 빈도수 측정부(101)에서 바로 분할 제어부(102)에 제1 주기마다의 접속 빈도를 수신하게 될 것이다. 분할 제어부(102)는 예를 들어 16번의 제1 주기의 총 접속 빈도수를 데이터 요소별로 합산하게 된다. 파티션을 분할 시에는 총 접속 빈도수를 이용하거나, 상기 제1 및 제2 로드를 계산하는 방법과 같이 2단계의 가중치를 반영한 수학식 1 및 수학식 2를 이용하여, 각 데이터 요소별 로드를 계산하게 될 것이다.The partition control unit 102 calculates the total access frequency for the data elements in the partition based on the third period. Therefore, the frequency measuring unit 101 will receive the access frequency for each first period directly from the division controller 102. The division control unit 102 adds, for example, the total number of connection frequencies in the first 16 periods for each data element. When partitioning, the load for each data element is calculated by using the total access frequency or by using Equations 1 and 2 reflecting the two-stage weights as in the method of calculating the first and second loads. Will be.

각 데이터 요소별 로드가 계산되면, 총 로드를 반으로 줄일 수 있는 데이터 요소를 기준으로 파티션 분할을 하게 된다. 접속 빈도수에 상관없이 데이터 요소 중 중간에 위치한 데이터 요소를 기준으로 분할하게 된다면, 접속 빈도를 반영하지 않고 분할한 것이기 때문에, 본 발명의 목적을 효과적으로 달성하기 힘들 것이다. 따라서, 데이터 요소별로 수집된 접속 빈도에 근거하여 로드가 반절인 부분을 기준으로 삼게 된다. 이에 따라서, 대용량의 데이터에 관한 파티션 분할에 있어서도 효율성을 높일 수 있어, 접속 빈도의 분배를 효과적으로 수행할 수 있는 효과가 있을 것이다.When the load for each data element is calculated, partitioning is based on data elements that can reduce the total load in half. Regardless of the access frequency, if the data element is divided based on the data element located in the middle of the data element, it is difficult to effectively achieve the object of the present invention because it is divided without reflecting the access frequency. Therefore, based on the access frequency collected for each data element, the load is based on the half. As a result, even in partitioning of a large amount of data, the efficiency can be increased, thereby effectively distributing the connection frequency.

상기 기능을 수행하기 위해, 빈도수 측정부(101), 로드 연산부(103) 및 분할 제어부(102)는 접속 빈도수 측정 또는 로드 계산시 파티션에 포함된 데이터 요소별 로 측정 또는 계산하는 기능이 더 포함될 수 있다.In order to perform the above function, the frequency measuring unit 101, the load calculating unit 103, and the splitting control unit 102 may further include a function of measuring or calculating for each data element included in a partition when measuring a connection frequency or calculating a load. have.

도 2는 로드 연산부(103)에서 제1 로드를 연산하는 예를 도시한 것이다.2 illustrates an example of calculating the first load by the load calculator 103.

도 2를 참조하면, 파티션 서버에 포함된 복수의 파티션들(P1, P2 ~ Pn)은 제1 주기(t(i+1) - ti, i는 정수)마다 접속 빈도가 빈도수 측정부(101)에 의해 측정된다. 각 파티션의 타임 테이블 위에 적힌 숫자가 바로 제1 주기마다의 파티션으로의 접속 빈도를 의미한다. 또한 가중치는 제2 주기마다 리셋되어 가중되며, t0과 t1사이의 최초의 제1 주기에서부터 t3과 t4 사이의 최후의 제1 주기에 순서대로 1, 2, 3, 4가 설정되어 있다Referring to FIG. 2, the plurality of partitions P1, P2 ˜ Pn included in the partition server have an access frequency having a frequency measuring unit 101 for each first period t (i + 1) −ti, i being an integer. Is measured by. The number written on the time table of each partition means the frequency of access to the partition every first period. The weight is reset and weighted every second period, and 1, 2, 3, and 4 are set in order from the first first period between t0 and t1 to the last first period between t3 and t4.

도 2에서 파티션 P1, P2 및 Pn을 비교한다. 각 제1 주기별 접속 빈도수에 가중치를 곱하고, 제1 주기의 수(4)에 따른 평균을 계산한다. 제1 로드가 연산된 결과(200)를 보면, 하나의 제2 주기(t0 ~ t4)에서의 각 파티션 P1, P2 및 Pn의 제1 로드는 각각 13, 9, 10.75가 된다.In Fig. 2, partitions P1, P2 and Pn are compared. A weight is multiplied by the connection frequency for each first period, and an average according to the number of first periods 4 is calculated. As a result of the calculation of the first load 200, the first loads of the partitions P1, P2, and Pn in one second period t0 to t4 become 13, 9, and 10.75, respectively.

도 3은 분할 제어부(102)에서 제2 로드를 연산하는 예를 도시한 것이다. 도 1 내지 도 2에 대한 설명과 중복되는 부분은 설명을 생략하기로 한다.3 illustrates an example of calculating the second load in the division controller 102. Portions overlapping with the description of FIGS. 1 and 2 will be omitted.

도 3을 참조하면, 로드 연산부(103)에 의해 연산된 제1 로드가 제2 주기(T(i+1) - Ti, i는 정수)별로 분할 제어부(102)에 의해 연산된다. 각 파티션(P1, P2, Pn)의 타임 테이블 위에 적힌 숫자는 제2 주기마다 파티션별로 연산된 제1 로드를 의미한다. 또한 가중치는 제3 주기마다 리셋되어 가중되며, T0고 T1 사이의 최초의 제2 주기에서부터 T3과 T4 사이의 최후의 제2 주기에 순서대로 1, 2, 3, 4가 설정되어 있다.Referring to FIG. 3, the first load calculated by the load calculator 103 is calculated by the division controller 102 for each second period T (i + 1) −Ti, i is an integer. The number written on the time table of each partition P1, P2, and Pn means a first load calculated for each partition every second period. The weight is reset and weighted every third period, and 1, 2, 3, and 4 are set in order from the first second period between T0 and T1 to the last second period between T3 and T4.

도 3에서 파티션 P1, P2, Pn을 비교한다. 각 제2 주기별 제1 로드에 가중치를 곱하고, 제2 주기의 수(4)에 따른 평균을 계산한다. 제2 로드가 연산된 결과(300)를 참조하면, 하나의 제3 주기(T0 ~ T4)에서의 각 파티션 P1, P2, Pn의 제2 로드는 각각 58, 43.75, 37.5가 된다.3, partitions P1, P2, and Pn are compared. The first load for each second period is multiplied by a weight, and an average according to the number 4 of the second periods is calculated. Referring to the result 300 of the calculation of the second load, the second loads of the partitions P1, P2, and Pn in one third period T0 to T4 become 58, 43.75, and 37.5, respectively.

각 파티션(P1, P2 ~ Pn)마다 분할될 파티션인지 판단된다. 각 파티션(P1, P2 ~ Pn)의 제2 로드가 임계값(예를 들어 50)을 초과하는 파티션(P1)이 파티션 분할의 대상으로 선택되는 것이다.It is determined whether the partition is divided for each partition P1, P2 to Pn. The partition P1 in which the second load of each partition P1, P2 to Pn exceeds a threshold value (for example, 50) is selected as the target of partition partitioning.

도 4는 분할 제어부(102)의 파티션 분할 예를 도시한 것이다.4 shows an example of partitioning of the partition control unit 102.

도 4를 참조하면, 분할 제어부(102)는 접속 빈도에 근거하여 파티션을 분할할 위치를 결정하게 된다.Referring to FIG. 4, the partition controller 102 determines a position to partition a partition based on the access frequency.

분할 제어부(102)는 분할할 파티션이 선택되면 파티션을 어느 데이터 요소를 중심으로 분할할 것인지를 판단한다. 도 4에서 분할 대상이 된 2개의 파티션에 대한 접속 빈도 그래프(410, 420)를 가정한다.When the partition to be divided is selected, the partition control unit 102 determines which data element to partition the partition. In FIG. 4, it is assumed that connection frequency graphs 410 and 420 for two partitions to be divided.

제1 파티션(430)에 대한 접속 빈도 그래프(410)를 참조하면, 데이터 요소(또는 데이터 테이블의 각 행)를 키값을 중심으로 수평으로 늘어놓은 파티션을 가정한다. 분할 제어부(102)는 각 데이터 요소마다의 접속 빈도를 계산하게 된다. 도 4에서는 그래프(410, 420)의 y축이 접속 빈도로 사용되고 있다. x축은 수평으로 늘어놓은 데이터 요소의 위치를 의미한다.Referring to the access frequency graph 410 for the first partition 430, assume a partition in which data elements (or each row of a data table) are arranged horizontally around a key value. The division control unit 102 calculates the connection frequency for each data element. In FIG. 4, the y-axis of the graphs 410 and 420 is used as the connection frequency. The x-axis represents the position of the data elements arranged horizontally.

제1 파티션(430)에 관한 그래프(410)를 참조하면, 그래프의 넓이는 파티션의 데이터 요소별 로드를 나타낸다. 따라서, 제1 파티션에 관한 그래프(410)에서 0 에 서 마지막 데이터 요소(401)까지의 그래프의 전체 넓이를 이등분 할 수 있는, 즉 S1 = S2(S1, S2는 각 그래프 부분의 넓이이다.)가 되는 점(400)에 대응하는 데이터요소(411)를 기준으로 하여 제1 파티션(430)을 분할한다.Referring to the graph 410 of the first partition 430, the width of the graph represents the load per data element of the partition. Thus, the entire area of the graph from zero to the last data element 401 in the graph 410 for the first partition can be bisected, i.e. S1 = S2 (S1, S2 is the width of each graph portion). The first partition 430 is divided based on the data element 411 corresponding to the point 400.

제2 파티션(440)에 대한 접속 빈도 그래프(420)를 참조하면, 제1 파티션(430)에 대한 그래프(410)과 다른 접속 빈도 분포를 알 수 있다. 따라서, 분할 제어부(102)는 각 데이터 요소바다의 접속 빈도를 제2 파티션(440)에 대해서도 계산하게 된다.Referring to the access frequency graph 420 for the second partition 440, a different access frequency distribution may be known from the graph 410 for the first partition 430. Therefore, the division control unit 102 calculates the frequency of connection of each data element sea for the second partition 440 as well.

제2 파티션(440)에 관한 그래프(420)를 참조하면, 그래프의 넓이는 역시 파티션의 데이터 요소별 로드를 나타낸다. 따라서, 제2 파티션(440)에 관한 그래프(420)에서 0에서 마지막 데이터 요소(403)까지의 그래프의 전체 넓이는 이등분할 수 있는, 즉 S3 = S4(S3, S4는 각 그래프 부분의 넓이이다.)가 되는 점(402)에 대응하는 데이터 요소(412)를 기준으로 하여 제2 파티션(440)을 분할한다.Referring to graph 420 regarding second partition 440, the width of the graph also represents the load per data element of the partition. Thus, in the graph 420 for the second partition 440, the total width of the graph from 0 to the last data element 403 can be bisected, i.e., S3 = S4 (S3, S4 is the width of each graph portion). The second partition 440 is partitioned on the basis of the data element 412 corresponding to the point 402 to be.

도 5는 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 방법에 대한 플로우차트이다. 이하의 설명에서는, 도 1 내지 도 4에 대한 설명과 중복되는 부분에 대해서는 설명을 생략하기로 한다.5 is a flowchart illustrating a partitioning method of a distributed file system according to an exemplary embodiment of the present invention. In the following description, descriptions that overlap with the description of FIGS. 1 to 4 will be omitted.

도 5를 참조하면, 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 방법은 먼저, 제1 주기가 도래할 때를 판단하여(S1) 제1 주기 도래시마다 빈도수 측정부(101)가 파티션으로의 접속 빈도를 측정하는 단계(S2)를 수행한다.Referring to FIG. 5, in the partition partitioning method of the distributed file system according to an exemplary embodiment of the present invention, first, when a first period arrives (S1), the frequency measuring unit 101 is divided into partitions at each arrival of the first period. The step of measuring the access frequency of (S2) is performed.

이후 로드 연산부(103)는 제2 주기가 도래할 때를 판단하여(S3), 제2 주기가 도래할 때마다 상기 수학식 1, 및 제2 주기에 상응하는 접속 빈도들에 근거하여 제 1 로드를 계산하는 단계(S4)를 수행한다.Thereafter, the load calculating unit 103 determines when the second period arrives (S3), and each time the second period arrives, the first load based on the connection frequencies corresponding to Equation 1 and the second period. Perform step S4 of calculating.

분할 제어부(102)는 제3 주기가 도래할 때를 판단하고(S5), 제3 주기가 도래할 때마다 상기 수학식 2, 및 제3 주기에 상응하는 제1 로드들에 근거하여 제2로드를 연산하는 단계(S6)를 수행한다.The division controller 102 determines when the third cycle arrives (S5), and each time the third cycle arrives, the division controller 102 determines the second load based on the first loads corresponding to Equation 2 and the third cycle. Performs step S6.

제2 로드가 연산되면, 분할 제어부(102)는 계산 대상 파티션의 제2 로드가 임계값(Lth)을 초과하는지를 판단하는 단계(S7)를 수행하고, 초과하지 않는다면, 다시 새로운 제1 내지 제3 주기를 시작하게 되고, 제2 로드가 임계값(Lth)을 초과하는 것으로 판단되면 대상 파티션을 분할하게 된다.When the second load is calculated, the division controller 102 performs step S7 of determining whether the second load of the partition to be calculated exceeds the threshold Lth, and if not, again, the new first to third ones. The cycle is started, and if it is determined that the second load exceeds the threshold Lth, the target partition is divided.

추가적으로, 도 5에 도시되지는 않았지만, 분할 제어부(102)가 상기 언급한 바와 같이 데이터 요소별로 측정 또는 연산된 접속 빈도, 제1 로드 또는 제2 로드에 근거하여 파티션 분할 대상으로 선택된 파티션의 제2 로드를 반으로 나누는 데이터 요소를 기준으로 파티션을 분할하는 단계가 더 포함될 수 있을 것이다.In addition, although not shown in FIG. 5, the second part of the partition selected as the partitioning target based on the connection frequency, the first load or the second load, measured or calculated for each data element, as described above. Partitioning may be further included based on data elements that divide the load in half.

본 발명의 실시 예에서 제1 내지 제3 주기의 시작점은 같은 것으로 설정되어 있다. 제1 내지 제3 주기의 시작점이 같이 않은 경우 상기 언급한 접속 빈도, 제1 로드 및 제2 로드의 계산에 있어서 시간 차에 의한 오류가 발생하기 때문이다.In an embodiment of the present invention, the starting points of the first to third cycles are set to be the same. This is because an error due to time difference occurs in the calculation of the above-mentioned connection frequency, the first load, and the second load when the start points of the first to third cycles are not the same.

상기 언급한 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 장치 및 방법에 대한 상기의 설명은, 오로지 설명적인 용도로만 사용되어야 할 것이다. 또한, 상기 설명은 특허청구범위를 제한해서는 안 될 것이다. The above description of the partitioning apparatus and method of the distributed file system according to the embodiment of the present invention mentioned above should be used only for descriptive purposes. Moreover, the above description should not limit the scope of the claims.

상기 언급한 본 발명의 실시 예 이외에도, 본 발명과 동일할 기능을 하는 균등한 발명 역시, 본 발명의 특허청구범위에 속할 것임은 당연할 것이다.In addition to the above-described embodiments of the present invention, it will be obvious that equivalent inventions having the same function as the present invention will also belong to the claims of the present invention.

도 3은 분할 제어부(102)에서 제2 로드를 연산하는 예를 도시한 것이다.3 illustrates an example of calculating the second load in the division controller 102.

도 5는 본 발명의 실시 예에 따른 분산 파일 시스템의 파티션 분할 방법에 대한 플로우차트이다.5 is a flowchart illustrating a partitioning method of a distributed file system according to an exemplary embodiment of the present invention.

Claims

A frequency measuring unit for measuring a frequency of connection to a partition for distributing and storing data every first period;

A load calculation unit generating a first load by using connection frequencies corresponding to the second period for every second period that is an integer multiple of the first period;

And a partition controller configured to control whether or not to partition the partition based on a second load that calculates first loads corresponding to the third period for every third period that is an integer multiple of the second period. Splitting device.

The method according to claim 1,

The load calculation unit and the division control unit,

And partitioning the first load and the second load by assigning a weight to each time unit.

The method according to claim 1,

The load calculation unit,

L1 is the first load, and Ft1, Ft2, Ft3, and Ft4 are arranged in chronological order of connection frequencies measured for each first period, and a, b, c, and d are weights for each first period.

And calculating the first load by the equation L1 = (aFt1 + bFt2 + cFt3 + dFt4) / 4.

The method according to claim 1,

The division control unit,

L2 is the second rod, L1T1, L1T2, L1T3, and L1T4 are first rods for each second period, and a, b, c, and d are weights for each second period.

And computing the first load according to the equation L2 = (aL1T1 + bL1T2 + cL1T3 + dL1T4) / 4.

The method according to claim 1,

The division control unit,

And dividing the partition to be partitioned, calculating the second load corresponding to the third period, and determining a location to partition the partition based on the second load.

Measuring, by the frequency measuring unit, a frequency of access to a partition that distributes and stores data every first period;

Generating, by the load computing unit, the first load by using connection frequencies corresponding to the second period for every second period that is an integer multiple of the first period; And

And controlling, by the partition controller, whether to partition the partition based on a second load that calculates first loads corresponding to the third period for every third period that is an integer multiple of the second period. Split method.

The method of claim 6,

Generating the first rod,

Calculating the first load by assigning a weight to the first period,

Controlling whether or not the partition is divided,

And calculating the second load by assigning weights to the second periods.

The method of claim 6,

Generating the first rod,

Computing the first load according to the equation L1 = (aFt1 + bFt2 + cFt3 + dFt4) / 4.

The method of claim 6,

Controlling whether or not the partition is divided,

Computing the first load according to the equation L2 = (aL1T1 + bL1T2 + cL1T3 + dL1T4) / 4.

The method of claim 6,

Controlling whether or not the partition is divided,

Calculating the second load corresponding to the third period when the partition to be partitioned is divided; And

And determining a location to partition the partition based on the second load.