KR20100072770A

KR20100072770A - Hot data management based on hit counter from data servers in parallelism

Info

Publication number: KR20100072770A
Application number: KR1020080131277A
Authority: KR
Inventors: 이상민; 김홍연; 김영균; 남궁한
Original assignee: 한국전자통신연구원
Priority date: 2008-12-22
Filing date: 2008-12-22
Publication date: 2010-07-01
Also published as: KR101189766B1; US8126997B2; US20100161780A1

Abstract

PURPOSE: A hot data management method based on access number of times collected form data servers in dispersion are provide to perform monitoring and resolving hot data through effective dispersion of a load even if request of many users is generated at predetermined time in asymmetry storage system, thereby providing stable data service. CONSTITUTION: An access number about data which each data server of an asymmetric storage system stores is monitored. Each data server maintains access number information per data in recent state. Each data server transmits the access number information in predetermined cycle to a meta data server. According to crystallization of the metadata server, each data server performs deletion of a duplicate or copy(S971).

Description

Hot Data Management Based on Hit Counter from Data Servers in Parallelism

본 발명은 비대칭 스토리지 시스템에 관한 것으로서, 구체적으로는 비대칭 스토리지 시스템에서 효율적으로 핫 데이터를 관리하여 핫 데이터로 인한 데이터 서버의 부하 집중을 방지할 수 있는 접근 횟수 기반의 핫 데이터 관리 방법에 관한 것이다.The present invention relates to an asymmetric storage system, and more particularly, to a hot data management method based on the number of accesses that can prevent the load of the data server due to the hot data by efficiently managing the hot data in the asymmetric storage system.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2007-S-016-02, 과제명: 저비용 대규모 글로벌 인터넷 서비스 솔루션 개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2007-S-016-02] ].

대부분의 대용량 스토리지 시스템은 높은 확장성을 보장하기 위해서 메타데이터와 데이터를 분리하여 처리하며, 메타데이터를 메타데이터 서버가 관리하고 데이터를 데이터 서버가 관리하는 비대칭 구조를 채택하고 있다. 여기서, 메타데이터는 파일의 실제적인 데이터를 저장하는 데이터 서버의 위치 정보를 의미한다.Most mass storage systems adopt asymmetrical structures in which metadata and data are processed separately to ensure high scalability, metadata is managed by the metadata server, and data is managed by the data server. Here, metadata means location information of a data server that stores actual data of a file.

각 데이터를 저장 및 관리하는 데이터 서버는 사용자의 요청에 의해 디스크에 저장된 실제 데이터를 네트워크를 통해서 제공하는데, 데이터 서버의 디스크 성능이나 네트워크의 전송 성능에 따라, 하나의 데이터 서버를 통해 서비스할 수 있는 성능의 한계가 존재한다.The data server that stores and manages each data provides the actual data stored on disk by the user's request through the network. Depending on the disk performance of the data server or the transmission performance of the network, the data server can serve one data server. There is a limit to performance.

예컨대, UCC(User Created Contents)와 같은 대규모 동영상 서비스가 제공되는 경우에, 일정 기간에 특정 동영상 파일에 많은 접근이 발생하면, 해당 데이터를 저장 및 관리하는 데이터 서버에 많은 읽기 요청들이 발생한다. 그러나, 디스크 또는 네트워크의 최고 성능까지만 데이터 서비스가 제공하므로, 추가적인 데이터 서비스뿐만 아니라 기존에 발생한 사용자의 동영상 서비스에까지 장애(예, 동영상의 끊김)가 발생한다.For example, when a large video service such as User Created Contents (UCC) is provided, a large number of accesses to a specific video file in a certain period of time, many read requests to the data server that stores and manages the data. However, since the data service provides only the highest performance of the disk or network, a failure (for example, a break in the video) occurs not only for the additional data service but also for the existing video service of the user.

비대칭 스토리지 시스템에서, 많은 사용자들에 의해 특정 파일에 대해 일정한 기간에 집중적인 읽기 요청이 발생할 때(이하, 핫 데이터로 칭함), 상기 특정 파일의 데이터를 저장 및 관리하는 데이터 서버의 물리적 성능(즉, 디스크 및 네트워크의 성능)의 한계로 인해 원활한 데이터 서비스가 제공되지 못한다. 이를 해결하기 위해서, 데이터 서버가 아닌 단일 메타데이터 서버의 메타데이터 접근 횟수로 핫 데이터를 감지 및 해결 하고자 하면, 데이터의 실제 부하인 파일 읽기 요청의 횟수를 추적할 수 없다. 또한, 메타데이터를 접근할 때마다 매번 접근 횟수 값의 갱신이 이루어져야 하므로, 많은 부하가 발생 된다.In an asymmetric storage system, when a large number of users encounter a intensive read request for a specific file over a period of time (hereinafter referred to as hot data), the physical performance of the data server storing and managing the data of the specific file (ie , The performance of disks and networks) can not provide a smooth data service. To solve this problem, if you try to detect and resolve the hot data by the metadata access count of a single metadata server instead of the data server, you cannot track the number of file read requests, which is the actual load of the data. In addition, since the access count value must be updated every time the metadata is accessed, a lot of load is generated.

또한, 핫 데이터는 한 번 핫 데이터가 되더라도, 일정한 시간이 지나면 계속 핫 데이터가 계속 유지되지 않는 특성을 지니고 있다. 이러한 핫 데이터의 특성을 고려하지 않으면, 핫 데이터를 해결하기 위해서 추가적으로 복제된 데이터는 스토리지를 낭비시킨다.In addition, even if the hot data is once hot data, the hot data is not continuously maintained after a certain time. Without considering the characteristics of these hot data, additionally duplicated data wastes storage to solve the hot data.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 비대칭 스토리지 시스템에서 효율적으로 핫 데이터를 관리하여 핫 데이터로 인한 데이터 서버의 부하 집중을 방지할 수 있는 접근 횟수 기반의 핫 데이터 관리 방법을 제공하는 데 본 발명의 목적이 있다.The present invention is to solve the above problems, to provide a hot data management method based on the number of accesses that can prevent the load concentration of the data server due to hot data by efficiently managing the hot data in an asymmetric storage system There is an object of the present invention.

본 발명의 다른 목적은 비대칭 스토리지 시스템에서 일정한 시간에 많은 사용자 요청이 발생하더라도 부하의 효율적인 분산을 통해 핫 데이터를 감지 및 해결함으로써, 안정적인 데이터 서비스를 제공할 수 있는 접근 횟수 기반의 핫 데이터 관리 방법을 제공하는 데 있다.Another object of the present invention is to provide a hot data management method based on the number of accesses that can provide a stable data service by detecting and solving hot data through efficient distribution of load even if a large number of user requests occur at a given time in an asymmetric storage system. To provide.

본 발명의 또 다른 목적은 각 데이터 서버에서 데이터 접근 횟수를 수집하고 일정한 기간에 메타데이터 서버에게 보내어 핫 데이터를 감지, 해결 및 추적함으로써, 핫 데이터가 발생하더라도 사용자에게 읽기 데이터 서비스를 원활하게 제공할 수 있는 접근 횟수 기반의 핫 데이터 관리 방법을 제공하는 데 있다.Another object of the present invention is to collect the number of data access in each data server and send it to the metadata server in a certain period of time to detect, resolve and track the hot data, so that even if hot data occurs, the user can smoothly provide a read data service to the user. To provide a hot data management method based on the number of accesses.

본 발명의 또 다른 목적은 복제를 통해 이미 해결된 핫 데이터 리스트를 추적 관리하여 더 이상 핫 데이터가 아닌 경우에는 할당된 데이터를 횟수함으로써, 스토리지의 낭비를 방지할 수 있는 접근 횟수 기반의 핫 데이터 관리 방법을 제공하는 데 있다.Another object of the present invention is to track and manage the list of hot data already resolved through replication, and the number of allocated data when the number of times is not hot data. To provide a way.

이와 같은 목적을 달성하기 위하여, 본 발명은 비대칭 스토리지 시스템의 각 데이터 서버가 자신이 저장하는 데이터에 대한 접근 회수를 감시하여, 상기 데이터별 접근 횟수 정보를 최신 상태로 유지하는 단계와, 상기 각 데이터 서버가 소정 주기별로 상기 접근 횟수 정보를 메타데이터 서버에 전송하는 단계와, 상기 각 데이터 서버가 상기 메타데이터 서버의 결정에 따라 데이터의 복제 또는 복제본의 삭제를 수행하는 단계를 포함하는 접근 횟수 기반의 핫 데이터 관리 방법을 제공한다.In order to achieve the above object, the present invention monitors the number of times of access to data stored by each data server of the asymmetric storage system, and keeps the access number information for each data up to date; A server transmitting the access count information to a metadata server at predetermined intervals, and each data server performing copying or deleting a copy of data according to the determination of the metadata server. Provides a hot data management method.

본 발명의 다른 면에 따라, 각 데이터 별 접근 횟수 필드를 포함하는 핫 데이터 관리 테이블을 구성하는 단계와, 하나 이상의 데이터 서버로부터 각 데이터 서버가 저장하는 데이터에 대한 접근 횟수 정보를 수집하는 단계와, 상기 접근 횟수 정보에 따라 상기 핫 데이터 관리 테이블을 갱신하는 단계와, 소정 주기로 상기 핫 데이터 관리 테이블을 확인하여 핫 데이터 여부를 판단하는 단계와, 핫 데이터로 판단된 데이터 파일을 새로운 데이터 서버에 복제하는 단계와, 더 이상 핫 데이터가 아닌 것으로 판단된 데이터 파일의 복제본을 삭제하는 단계를 포함하는 접근 횟수 기반의 핫 데이터 관리 방법을 제공한다.According to another aspect of the invention, the method comprising the steps of constructing a hot data management table including an access count field for each data, collecting access count information for data stored in each data server from one or more data servers; Updating the hot data management table according to the access count information, checking the hot data management table at predetermined intervals to determine whether the hot data is present, and replicating the data file determined as hot data to a new data server. And deleting a copy of the data file determined to be no longer hot data.

본 발명의 또 다른 면에 따라, 복수 개의 데이터 서버가 자신이 저장 및 관리하는 데이터에 대한 데이터 접근 횟수를 일정 기간 동안 유지하는 단계와, 상기 복수 개의 데이터 서버가 상기 일정 주기마다 상기 저장된 데이터에 대한 상기 데이터 접근 횟수를 관리 서버에게 전송하는 단계와, 상기 관리 서버가 상기 전송된 데이터 접근 횟수를 수집하여 저장하는 단계와, 상기 관리 서버가 일정 주기별로, 매 주기 시간 내에 각 데이터 서버별로의 접근 횟수가 미리 설정한 임계치를 초과한 데이터를 해당 핫 데이터로 인식하고, 핫 파일의 데이터를 상기 복수 개의 데이터 중 하나 이상의 데이터 서버에 추가적으로 복제하는 단계를 포함하는 핫 데이터 관리 방법을 제공한다.According to another aspect of the invention, a plurality of data server to maintain the number of times of data access to the data stored and managed by the plurality of data server for a predetermined period, the plurality of data server for the predetermined period for the stored data Transmitting the number of data accesses to a management server; collecting and storing the number of data accesses transmitted by the management server; and number of accesses of each data server by the management server at a predetermined period and every cycle time. Recognizing data exceeding a predetermined threshold as the corresponding hot data, and additionally replicating the data of the hot file to one or more data servers of the plurality of data.

본 발명에 따르면, 비대칭 스토리지 시스템에서 일정한 시간에 많은 사용자 요청이 발생하더라도 부하의 효율적인 분산을 통해 핫 데이터를 감지 및 해결함으로써, 안정적인 데이터 서비스를 제공할 수 있다.According to the present invention, even if a large number of user requests occur at a given time in an asymmetric storage system, by detecting and solving hot data through efficient load distribution, a stable data service can be provided.

또한, 각 데이터 서버에서 데이터 접근 횟수를 수집하고 일정한 기간에 메타데이터 서버에게 보내어 핫 데이터를 감지, 해결 및 추적함으로써, 핫 데이터가 발생하더라도 사용자에게 읽기 데이터 서비스를 원활하게 제공할 수 있다.In addition, the number of data accesses are collected from each data server and sent to the metadata server within a certain period of time to detect, resolve, and track the hot data, thereby smoothly providing a read data service to the user even if the hot data occurs.

아울러, 복제를 통해 이미 해결된 핫 데이터 리스트를 추적 관리하여 더 이상 핫 데이터가 아닌 경우에는 할당된 데이터를 횟수함으로써, 스토리지의 낭비를 방지할 수 있다.In addition, by tracking and managing the hot data list already solved through replication, the number of allocated data is counted when it is no longer hot data, thereby preventing waste of storage.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail a preferred embodiment of the present invention.

도 1은 본 발명의 실시예들이 적용되는 비대칭 스토리지 시스템의 구성도이 다.1 is a configuration diagram of an asymmetric storage system to which embodiments of the present invention are applied.

도 1을 참조하면, 본 발명의 실시예들이 적용되는 비대칭 스토리지 시스템은, n개의 사용자 파일 시스템들(110-1 내지 110-n), 메타데이터 서버(120), 그리고 m개의 데이터 서버들(130-1 내지 130-m)을 구비한다.Referring to FIG. 1, an asymmetric storage system to which embodiments of the present invention are applied includes n user file systems 110-1 through 110-n, a metadata server 120, and m data servers 130. -1 to 130-m).

사용자 파일 시스템들(110-1 내지 110-n)은 파일 관련 사용자의 요청을 수신한다. 이때, 사용자 파일 시스템들(110-1 내지 110-n)은 사용자의 요청에 따라 메타데이터 서버(120)에게 메타데이터를 요구하고, 그리고 데이터 서버들(130-1 내지 130-m)에게 파일의 실제 데이터를 요구한다.User file systems 110-1 through 110-n receive a file related user's request. At this time, the user file systems 110-1 to 110-n request metadata from the metadata server 120 according to a user's request, and the data servers 130-1 to 130-m request the metadata. Requires real data

파일의 실제 데이터에 대한 위치정보를 관리하는 메타데이터 서버(120)는 사용자 파일 시스템들(110-1 내지 110-n) 중 특정 사용자 파일 시스템의 요청의 타당성을 검사하고, 요청된 메타데이터(즉, 파일 데이터의 위치 정보)를 네트워크를 통해 상기 특정 사용자 파일 시스템에게 전송한다.The metadata server 120 managing location information on the actual data of the file checks the validity of a request of a specific user file system among user file systems 110-1 through 110-n, and requests metadata (ie Location information of file data) is transmitted to the specific user file system via a network.

파일의 실제 데이터를 관리하는 데이터 서버들(130-1 내지 130-m)은 상기 특정 사용자 파일 시스템의 요청에 따라 디스크의 데이터를 상기 특정 사용자 파일 시스템에게 전달한다.The data servers 130-1 to 130-m managing the actual data of the file deliver data of the disk to the specific user file system according to the request of the specific user file system.

읽기 부하의 발생에 관한 정보를 수집하기 위하여, 데이터 서버들(130-1 내지 130-m)은 도 2에 도시된 바와 같은 구성으로 사용자의 데이터 읽기 요청에 대하여 데이터 접근 횟수를 유지한다.In order to collect information regarding the occurrence of the read load, the data servers 130-1 to 130-m maintain the number of data accesses for the user's data read request with the configuration as shown in FIG. 2.

데이터 접근 횟수 엔트리들(210 내지 212)은 데이터를 구별하기 위한 디스크 식별자 필드 및 데이터 식별자 필드, 접근 횟수를 기록하기 위한 접근 횟수 필드, 해쉬 리스트 필드 및 톱 리스트 필드를 포함하며, 사용자의 읽기 요청에 따라 빠른 데이터 접근 횟수 엔트리를 검색하기 위한 해쉬(hash) 함수(201), 그리고 MAX개의 해쉬 헤드를 구비한 해쉬 테이블(202)이 이용된다.The data access number entries 210 to 212 include a disc identifier field and a data identifier field for distinguishing data, an access count field for recording access times, a hash list field and a top list field, and the user access request. Thus, a hash function 201 for retrieving fast data access count entries, and a hash table 202 with MAX hash heads are used.

해쉬 함수(201)는 사용자 요청의 데이터 식별자를 가지며, 그리고 데이터 식별자를 MAX로 나눈(데이터 식별자 % MAX) 결과 값을 얻는다. 이 해쉬 결과값은 해쉬 테이블(202)의 하나의 엔트리의 값이 된다.Hash function 201 has the data identifier of the user request, and gets the result of dividing the data identifier by MAX (data identifier% MAX). This hash result is the value of one entry in the hash table 202.

만약, 다른 데이터 식별자를 MAX로 나눈 결과 값과 동일한 해쉬 결과값이 나오면, 데이터 접근 횟수 엔트리들은 도 2에 도시된 데이터 접근 횟수 엔트리(210) 및 데이터 접근 횟수 엔트리(211)와 같이 이전(prev) 엔트리 및 다음(next) 엔트리 정보를 포함하는 해쉬 리스트 필드를 통해 체인으로 연결된다.If a hash result value equal to the result of dividing another data identifier by MAX is obtained, the data access number entries are prev as shown in the data access number entry 210 and the data access number entry 211 shown in FIG. 2. It is chained through a hash list field containing the entry and next entry information.

각 데이터의 접근 횟수의 순위를 유지하기 위해서, 데이터 접근 횟수 엔트리들(210 내지 212)은 톱(top) 리스트 헤드를 시작으로 각 데이터 접근 횟수 엔트리의 이전(prev) 엔트리와 다음(next) 엔트리 정보를 포함하는 톱(top) 리스트 필드를 이용하여 연결관계를 형성한다. In order to maintain the ranking of the number of accesses of each data, the data access number entries 210 to 212 are used to display the prev and next entry information of each data access number entry, starting from the top list head. Form a connection using the top list field including a.

도 2에 도시된 예에서는, 톱 리스트 헤드에 이어 데이터 접근 횟수 엔트리(211)이 위치하고, 그 다음으로 데이터 접근 횟수 엔트리(212), 데이터 접근 횟수 엔트리(210)이 순차적으로 연쇄되는데, 톱 리스트 헤드에 가까운 데이터 접근 횟수 엔트리(211)일 수록 우선순위가 높은, 즉 접근횟수가 많은 파일에 대한 데이터 접근 횟수 엔트리가 되도록 구성함이 바람직하다.In the example shown in FIG. 2, the data access number entry 211 is located after the top list head, and then the data access number entry 212 and the data access number entry 210 are sequentially concatenated. It is preferable that the data access number entry 211 close to the higher priority, that is, the data access number entry for a file having many access times.

도 3은 본 발명의 일실시예에 따른 데이터 접근 횟수의 처리 방법을 나타낸 흐름도이다. 이하에서, 데이터 서버는 도 1의 데이터 서버들(130-1 내지 130-m) 중 어느 하나의 데이터 서버이다.3 is a flowchart illustrating a method of processing data access times according to an embodiment of the present invention. Hereinafter, the data server is any one of the data servers 130-1 to 130-m of FIG. 1.

도 3에 도시된 바와 같이, 상기 데이터 서버가 기동되면, 상기 데이터 서버는 상기 해쉬 테이블을 초기화하고(S301), 사용자 요청을 기다린다(S302). 이때, 읽기 또는 삭제에 관한 사용자 요청이 수신되면(S310), 이하에 기재한 과정을 수행하여 데이터 접근 횟수를 갱신한다.As shown in FIG. 3, when the data server is started, the data server initializes the hash table (S301) and waits for a user request (S302). In this case, when a user request for reading or deleting is received (S310), the following procedure is performed to update the number of data accesses.

즉, 데이터 서버는 상기 해쉬 함수에 상기 데이터 식별자를 대입하여 해쉬 결과값을 얻는다(S311). 이어서, 상기 데이터 서버는 상기 해쉬 결과값으로 해쉬 테이블의 엔트리, 즉 해쉬 리스트 헤드를 얻은 후(S312), 다음(next) 엔트리를 임시 엔트리에 넣는다(S313).That is, the data server substitutes the data identifier into the hash function to obtain a hash result value (S311). Subsequently, the data server obtains an entry of a hash table, that is, a hash list head as the hash result value (S312), and then inserts a next entry into a temporary entry (S313).

상기 데이터 서버는 임시 엔트리와 해쉬 리스트 헤드가 같은 지를 판단하여(S320), 판단결과 같으면, 사용자의 요청의 디스크 식별자와 데이터 식별자가 같은지를 판단한다(S330). 판단결과 디스크 식별자와 데이터 식별자가 같으면, 상기 데이터 서버는 사용자 요청이 삭제 요청 또는 읽기 요청인지를 판단한다(S340).The data server determines whether the temporary entry and the hash list head are the same (S320). If the result is the same, the data server determines whether the disk identifier and the data identifier of the user's request are the same (S330). If the disc identifier and the data identifier are the same as the result of the determination, the data server determines whether the user request is a delete request or a read request (S340).

판단결과 사용자 요청이 삭제 요청이면, 상기 데이터 서버는 해당 데이터 접근 횟수 엔트리를 해쉬 리스트로부터 제거하고(S341), 또한 해당 데이터 접근 횟수 엔트리를 톱(top) 리스트로부터 제거한다(S342). 그리고, 상기 데이터 서버는 해당 데이터 접근 횟수 엔트리를 제거한다(S343).If it is determined that the user request is a deletion request, the data server removes the corresponding data access number entry from the hash list (S341), and also removes the corresponding data access number entry from the top list (S342). The data server removes the corresponding data access number entry (S343).

판단 과정(S340)에서의 판단결과 사용자 요청이 읽기 요청이면, 상기 데이터 서버는 해당 데이터 접근 횟수 엔트리의 접근 횟수를 1회 증가시키고(S351), 해당 데이터 접근 횟수 엔트리의 톱 리스트를 갱신한다(S352).If the user request is a read request, the data server increments the access count of the corresponding data access number entry once (S351) and updates the top list of the corresponding data access count entry (S352). ).

판단 과정(S320)에서의 판단결과 임시 엔트리와 해쉬 리스트 헤드가 같으면, 상기 데이터 서버는 새로운 데이터 접근 횟수 엔트리를 생성하고, 생성된 데이터 접근 횟수 엔트리에 디스크 식별자와 데이터 식별자를 넣고, 생성된 데이터 접근 횟수 엔트리의 접근 횟수를 '1'로 초기화한다(S361). 이어서, 상기 데이터 서버는 생성된 데이터 접근 횟수 엔트리의 해쉬 리스트를 해쉬 리스트 헤드에 넣고(S362), 생성된 데이터 접근 횟수 엔트리의 톱 리스트를 톱 리스트 헤드의 마지막 부분에 넣는다(S363).If the temporary entry and the hash list head are the same as the determination result at operation S320, the data server creates a new data access number entry, inserts a disk identifier and a data identifier into the generated data access number entry, and accesses the generated data. The number of accesses of the count entry is initialized to '1' (S361). Subsequently, the data server puts a hash list of the generated data access number entries into the hash list head (S362), and puts a top list of the generated data access number entries into the last portion of the top list head (S363).

판단 과정(S330)에서의 판단결과 디스크 식별자와 데이터 식별자가 같으면, 상기 데이터 서버는 다음 엔트리를 임시 데이터 접근 횟수 엔트리에 넣고(S371), 판단 과정(S320)으로 넘어간다.If the disc identifier and the data identifier are the same as the result of the determination in operation S330, the data server inserts the next entry into the temporary data access number entry in operation S371 and proceeds to determination in operation S320.

정리하면, 데이터 서버는 데이터 읽기 또는 삭제 요청이 수신되면, 해쉬 함수에 데이터 식별자를 넣어서 결과값을 얻은 후, 해쉬 테이블의 엔트리, 즉 해쉬 리스트 헤드에서 해당 데이터 식별자가 있는 데이터 접근 횟수 엔트리가 있는지 검사한다. 그 다음 엔트리가 존재하고 읽기 요청이면 접근 횟수 필드를 하나 증가시키고, 엔트리의 톱 리스트를 갱신하며, 엔트리가 존재하지 않는 경우에는 새로운 데이터 접근 횟수 엔트리를 하나 생성하여 초기화한 후에 해쉬 리스트 헤드에 넣는다. 엔트리가 존재하며 삭제 요청이면, 해당 데이터 접근 횟수 엔트리를 제거한다.In summary, when a data read or delete request is received, the data server inserts a data identifier into the hash function to obtain a result, and then checks for an entry in the hash table, that is, a data access count entry with the data identifier in the hash list head. do. If the next entry is a read request, the access count field is incremented by one, the top list of entries is updated, and if there is no entry, a new data access count entry is created, initialized, and put into the hash list head. If the entry exists and the delete request is made, the corresponding data access count entry is removed.

도 4는 본 발명의 일실시예에 따른 데이터 접근 횟수의 순위를 변경하는 방법을 나타낸 흐름도이다. 4 is a flowchart illustrating a method of changing a rank of data access times according to an embodiment of the present invention.

도 4를 참조하면, 데이터 서버는 증가된 데이터 접근 횟수를 갖는 해당 데이터 접근 횟수 엔트리의 톱 리스트로부터 이전 엔트리를 가져오고(S401), 이전 엔트리를 임시 엔트리에 넣는다(S402).Referring to FIG. 4, the data server fetches the previous entry from the top list of the corresponding data access number entry having the increased data access count (S401), and puts the previous entry into the temporary entry (S402).

상기 데이터 서버는 임시 엔트리가 톱 리스트 헤드와 같은지를 판단하여(S410), 판단결과 같지 않으면, 임시 엔트리의 접근 횟수가 데이터 접근 횟수 엔트리의 접근 횟수보다 크거나 같은지를 판단한다(S420). The data server determines whether the temporary entry is the same as the top list head (S410). If the determination result is not the same, the data server determines whether the access number of the temporary entry is greater than or equal to the access number of the data access number entry (S420).

판단결과, 접근 횟수가 크거나 같은 임시 엔트리를 찾으면 임시 엔트리(정확히는 현재 임시 엔트리에 넣어진 데이터 접근 횟수 엔트리)의 톱 리스트의 다음(next)에 해당 데이터 접근 횟수 엔트리를 지정한다(S423). 이에 더하여, 해당 데이터 접근 횟수 엔트리의 이전(prev)에 임시 엔트리를 지정한다. 이 과정을 통하여 데이터 접근 횟수 엔트리간의 순위를 변경할 수 있다.As a result of the determination, when a temporary entry is found that is equal to or greater than the number of accesses, the corresponding data access number entry is designated next to the top list of the temporary entry (exactly, the data access number entry currently inserted into the temporary entry) (S423). In addition, a temporary entry is specified before the data access count entry. Through this process, it is possible to change the rank between entries of data access times.

판단 과정(S420)에서의 판단결과 접근 횟수가 크거나 같은 엔트리를 찾지 못하면, 상기 데이터 서버는 임시 엔트리의 톱 리스트로부터 이전 엔트리를 가져오고(S421), 가져온 이전 엔트리를 임시 엔트리에 넣는다(S422). 이와 같은 방식으로 우선 순위가 높은, 즉 접근 횟수가 더 많은 엔트리들을 탐색한다.If the determination result in the determination process (S420) does not find an entry with the same or greater number of times, the data server retrieves the previous entry from the top list of the temporary entries (S421), and inserts the obtained previous entry into the temporary entry (S422). . In this way, entries with higher priority, that is, more accesses, are searched.

톱 리스트 헤드에 이르기까지 해당 데이터 접근 횟수 엔트리보다 같거나 높은 접근 횟수를 가지는 엔트리를 못 찾으면 해당 데이터 접근 횟수 엔트리가 최 우선 순위를 가지는 엔트리가 되므로, 데이터 서버는 데이터 접근 횟수 엔트리의 톱 리스트의 현재 내용를 제거하고(S431), 데이터 접근 횟수 엔트리의 톱 리스트를 톱 리스트 헤드의 다음 엔트리로 지정한다(S432).If an entry with the same or higher number of accesses than the corresponding data access entry is found up to the head of the top list, then the data access entry is the entry with the highest priority, so the data server is currently in the top list of data access entry entries. The contents are removed (S431), and the top list of data access number entries is designated as the next entry of the top list head (S432).

도 5는 본 발명의 일실시예에 따른 데이터 접근 횟수의 전송 처리 방법을 나타낸 흐름도이다. 5 is a flowchart illustrating a method of processing data access times according to an embodiment of the present invention.

각 데이터 서버는 데이터 접근 횟수를 수집하여(S510), 사전에 설정한 주기에 도달하였는지를 판단하고(S520), 설정 주기에 도달하면 최대 전송 개수를 확인하여(S530), 이 수만큼 메타데이터 서버에 접근 횟수 정보를 전송한다(S540). 그 후, 접근 횟수 엔트리를 초기화한다(S550). Each data server collects the number of data accesses (S510), determines whether a predetermined period has been reached (S520), and when the set period is reached, checks the maximum number of transmissions (S530), and transmits the number to the metadata server. The access count information is transmitted (S540). Thereafter, the access count entry is initialized (S550).

한편, 최대 전송 개수를 확인하는 단계(S530)은 매번 수행할 필요가 없음은 물론이다.On the other hand, the step of checking the maximum number of transmissions (S530) need not be performed every time, of course.

도 6은 본 발명의 일실시예에 따라 접근 횟수 정보를 수신한 메타데이터 서버가 핫 데이터 관리 테이블의 구성을 나타낸 도면이다.6 is a diagram illustrating a configuration of a hot data management table by a metadata server that receives access count information according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 상기 메타데이터 서버에 구비된 데이터베이스(DB)에 저장될 상기 핫 데이터 관리 테이블은 파일(아이노드) 식별자 필드(601), 최근 접근 시간 필드(602), 접근 횟수 필드(603), 그리고 추가 복제 유무 필드(604)를 구비한다.As shown in FIG. 6, the hot data management table to be stored in a database (DB) included in the metadata server includes a file (inode) identifier field 601, a recent access time field 602, and an access count field. 603, and an additional replica field 604.

파일 식별자 필드(601)는 파일을 구별하는 값으로, 가상 파일 시스템(VFS: Virtual File System)에서는 아이노드 식별자이다.The file identifier field 601 is a value for distinguishing a file and is an inode identifier in a virtual file system (VFS).

최근 접근 시간 필드(602)는 상기 각 데이터 서버로부터 수신한 파일 데이터의 접근 횟수 값을 갱신한 최근 시간 정보를 유지한다.The recent access time field 602 maintains recent time information of updating the access count value of the file data received from each data server.

상기 추가복제 유무 필드(604)는 핫 데이터의 추적 및 관리를 위한 것으로, 핫 데이터를 해결하기 위해 추가 복제를 나타낸다.The additional copy presence field 604 is for tracking and managing hot data, and indicates additional copy to solve the hot data.

도면 부호 620은 접근 횟수 필드(603)의 구성을 나타낸다. 필드들(621-1 내지 621-60)인 min[0], min[1] ~ min[59]는 last_min(621) 시간을 기준으로 매분 마다 파일 접근 횟수 값을 나타낸다.. 필드들(631-1 내지 631-24)인 hour[0] 내지 hour[23]은 last_hour(631) 시간을 기준으로 매시간 마다 파일 접근 횟수 값을 나타낸다. 필드들(641-1 내지 641-365)인 day[0] 내지 day[364]는 last_day(641) 시간을 기준으로 매일 마다의 파일 접근 횟수 값을 나타낸다.Reference numeral 620 denotes a configuration of the access count field 603. The fields 621-1 to 621-60, min [0], min [1] to min [59], represent file access count values every minute based on the last_min (621) time. 1 to 631-24), hour [0] to hour [23], represent a file access count value every hour based on the last_hour 631 time. Days [0] through day [364], which are fields 641-1 through 641-365, represent file access counts per day based on the last_day 641 time.

도 7은 본 발명의 일실시예에 따른 데이터 접근 횟수 정보의 저장 방법을 나타낸 흐름도이다. 즉, 도 7은 상기 각 데이터 서버로부터 수신받은 데이터 접근 횟수 정보(예를 들어, 디스크 식별자, 데이터 식별자 및 접근 횟수)를 상기 핫 데이터 관리 테이블에 저장하는 과정을 나타낸 것이다. 이하에서, 메타데이터 서버는 도 1의 메타데이터 서버(120)와 동일한 서버이고, 데이터 서버는 도 1의 데이터 서버들(130-1 내지 130-m) 중 어느 하나의 데이터 서버이다.7 is a flowchart illustrating a method of storing data access count information according to an embodiment of the present invention. That is, FIG. 7 illustrates a process of storing data access count information (for example, disk identifier, data identifier, and access count) received from each data server in the hot data management table. Hereinafter, the metadata server is the same server as the metadata server 120 of FIG. 1, and the data server is a data server of any one of the data servers 130-1 to 130-m of FIG. 1.

도 7을 참조하면, 상기 메타데이터 서버는 상기 데이터 서버로부터 데이터 접근 횟수 값을 수신하고(S701), 디스크 식별자와 데이터 식별자가 속한 파일 식별자를 구한다(S702).Referring to FIG. 7, the metadata server receives a data access count value from the data server (S701), and obtains a file identifier to which a disk identifier and a data identifier belong (S702).

그 다음, 디스크 식별자 및 데이터 식별자에 대응하는 파일 식별자가 존재하는지 여부를 판단하여(S710), 존재하지 않으면 삭제된 파일로 간주하고, 수신 과정(S701)으로 되돌아가서 다른 데이터 접근 횟수 정보를 얻는다.Then, it is determined whether a file identifier corresponding to the disk identifier and the data identifier exists (S710), and if not present, it is regarded as a deleted file, and the process returns to the reception process (S701) to obtain other data access count information.

판단결과 파일 식별자가 존재하면, 상기 메타데이터 서버는 소정의 핫 데이터 관리 테이블에 상기 파일 식별자를 가진 데이터 접근 횟수 엔트리가 존재하는 지를 판단한다(S720).If it is determined that the file identifier exists, the metadata server determines whether a data access number entry having the file identifier exists in a predetermined hot data management table (S720).

판단결과 상기 파일 식별자를 가진 데이터 접근 횟수 엔트리가 존재하면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블의 접근 횟수 필드를 가져오고(S721), 접근 횟수 필드의 특정 필드들, 예컨대 min[0] 값, hour[0] 값 및 day[0] 값을 데이터 접근 횟수 값만큼 증가시키고(S722), 최근 접근 시간을 현재 시간으로 갱신하여(S723), 접근 횟수 필드를 갱신한다.If there is a data access number entry having the file identifier as a result of the determination, the metadata server fetches the access count field of the hot data management table (S721), and specific fields of the access count field, for example, a min [0] value. The hour [0] value and the day [0] value are increased by the data access count value (S722), the latest access time is updated to the current time (S723), and the access count field is updated.

판단 과정(S720)에서의 판단결과 상기 파일 식별자를 가진 데이터 접근 횟수 엔트리가 존재하지 않으면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블에 상기 파일 식별자를 가진 새로운 데이터 접근 횟수 엔트리를 추가한다(S731). 상기 메타데이터 서버는 새로운 데이터 접근 횟수 엔트리의 접근 횟수 필드의 모든 필드들, 즉 min[0] 내지 min[63], hour[0] 내지 hour[23] 및 day[0] 내지 day[364]들을 '0'으로 초기화한다(S732). 이어서, 상기 메타데이터 서버는 새로운 데이터 접근 횟수 엔트리의 접근 횟수 필드의 last_min, last_hour 및 last_day를 현재 시간으로 설정하고(S733), 필드의 증가 과정(S722)을 진행한다.If it is determined in operation S720 that the data access number entry having the file identifier does not exist, the metadata server adds a new data access number entry having the file identifier to the hot data management table in operation S731. . The metadata server checks all fields of the access count field of the new data access count entry, that is, min [0] to min [63], hour [0] to hour [23] and day [0] to day [364]. Initialize to '0' (S732). Subsequently, the metadata server sets last_min, last_hour, and last_day in the access count field of the new data access count entry as the current time (S733), and proceeds to increase the field (S722).

도 8은 본 발명의 일실시예에 따른 접근 횟수 필드의 갱신 방법을 나타낸 흐름도이다. 즉, 도 8은 상기 핫 데이터 관리 테이블의 접근 횟수 필드(즉, min[0] 내지 min[59], hour[0] 내지 hour[23] 및 day[0] 내지 day[364])를 현재 시간을 기준으로 갱신하는 과정을 나타낸 것이다. 이하에서, 메타데이터 서버는 도 1의 메타데이터 서버(120)와 동일한 서버이다.8 is a flowchart illustrating a method of updating an access count field according to an embodiment of the present invention. That is, FIG. 8 shows the number of access fields (ie, min [0] to min [59], hour [0] to hour [23], and day [0] to day [364]) of the hot data management table. It shows the process of updating on the basis of. Hereinafter, the metadata server is the same server as the metadata server 120 of FIG. 1.

도 8을 참조하면, 상기 메타데이터 서버는 현재 시간과 접근 횟수 필드의 last_day를 비교하여 last_day가 하루 이상 경과되었는지를 판단한다(S810).Referring to FIG. 8, the metadata server compares last_day of the current time and the access count field to determine whether last_day has elapsed for more than one day (S810).

판단결과 하루 이상 경과하였으면, 상기 메타데이터 서버는 접근 횟수 필드의 day[0] 내지 day[364]를 경과일(day)만큼 오른쪽으로 시프트시키고(S811), day[0] 부터 day[경과일 - 1]까지의 필드를 '0'으로 초기화한다(S812). 그리고, 상기 메타데이터 서버는 접근 횟수 필드의 min[0] 내지 min[59] 및 hour[0] 내지 hour[23]을 '0'으로 초기화하고(S813), last_min, last_hour 및 last_day를 현재 시간으로 갱신한다(S814).If it is determined that more than one day has passed, the metadata server shifts the day [0] to day [364] of the access count field to the right by the elapsed day (S811), and from day [0] to day [elapsed day- 1] field is initialized to '0' (S812). The metadata server initializes min [0] to min [59] and hour [0] to hour [23] of the access count field to '0' (S813), and sets last_min, last_hour and last_day as the current time. Update (S814).

판단 과정(S810)에서의 판단결과 하루 이상 경과하지 않았으면, 상기 메타데이터 서버는 현재 시간과 접근 횟수 필드의 last_hour를 비교하여 last_hour가 한 시간 이상 경과 되었는지를 판단한다(S820).If it is determined that the determination process (S810) has not elapsed more than one day, the metadata server compares the current time and last_hour in the access count field to determine whether last_hour has elapsed by at least one hour (S820).

판단 과정(S820)에서의 판단결과 한 시간 이상 경과 하였으면, 상기 메타데이터 서버는 접근 횟수 필드의 hour[0] 내지 hour[23]을 경과된 시간(hour)만큼 오른쪽으로 시프트시키고(S821), hour[0] 내지 hour[경과된 시간 - 1]를 '0'으로 초기화한다(S822). 그리고, 접근 횟수 필드의 min[0] 내지 min[59]를 '0'으로 초기화하고(S823), last_min 및 last_hour를 현재 시간으로 갱신한다(S824).If more than one hour has passed as a result of determination in the determination process (S820), the metadata server shifts hour [0] to hour [23] in the access count field to the right by an elapsed time (S821), hour [0] to hour [elapsed time-1] are initialized to '0' (S822). Then, min [0] to min [59] in the access count field are initialized to '0' (S823), and last_min and last_hour are updated to the current time (S824).

판단 과정(S820)에서의 판단결과 한 시간 이상 경과하지 않았으면, 상기 메타데이터 서버는 현재 시간과 접근 횟수 필드의 last_min을 비교하여 last_min이 1분 이상 경과 되었는지를 판단한다(S830). 판단결과 1분 이상 경과되지 않았으면, 상기 메타데이터 서버는 접근 횟수 필드의 갱신을 종료한다.If it is determined that the determination process (S820) has not elapsed more than an hour, the metadata server compares last_min of the current time and the access count field to determine whether last_min has elapsed for more than 1 minute (S830). If not more than 1 minute has passed, the metadata server terminates the update of the number of access fields.

판단 과정(S830)에서의 판단결과 1분 이상 경과 하였으면, 상기 메타데이터 서버는 접근 횟수 필드의 min[0] 내지 min[59]을 경과분(minute)만큼 오른쪽으로 시프트시키고(S831), min[0] 내지 minute[경과분 - 1]을 '0'으로 초기화한다(S832). 그리고, 상기 메타데이터 서버는 last_min을 현재 시간으로 갱신한다(S833).If more than 1 minute has passed as a result of the determination in the determination process (S830), the metadata server shifts min [0] to min [59] in the access count field to the right by the number of minutes (S831), and min [ 0] to minute [elapsed minute-1] are initialized to '0' (S832). The metadata server updates last_min with the current time (S833).

도 9는 본 발명의 일실시예에 따른 핫 데이터의 감지 및 복제 방법을 나타낸 흐름도이다. 이하에서, 메타데이터 서버는 도 1의 메타데이터 서버(120)와 동일한 서버이다.9 is a flowchart illustrating a method of detecting and duplicating hot data according to an embodiment of the present invention. Hereinafter, the metadata server is the same server as the metadata server 120 of FIG. 1.

도 9를 참조하면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블로부터 검사실행주기를 가져오고(S901), 핫 데이터의 감지를 위해서 상기 핫 데이터 관리 테이블로부터 검사 주기와 접근 횟수 문턱값(threshold)을 가져온다(S902). 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블로부터 최근 접근 시간이 현재 시간부터 검사 주기내에 있는 모든 엔트리들을 가져오고(S903), 가져온 엔트리들을 하나씩 확인하여(S904), 핫 데이터 인지를 검사한다(S910). 검사결과 핫 데이터가 아니면, 상기 메타데이터 서버는 검사실행주기만큼 슬립모드를 유지하고(S920), 엔트리를 가져오는 과정(S903)으로 되돌아간다.Referring to FIG. 9, the metadata server obtains a check execution cycle from the hot data management table (S901), and sets a check period and an access count threshold from the hot data management table to detect hot data. Bring (S902). The metadata server retrieves all entries whose latest access time is within the inspection period from the current time from the hot data management table (S903), checks the imported entries one by one (S904), and checks whether it is hot data (S910). . If the check result is not hot data, the metadata server maintains the sleep mode by the check execution cycle (S920), and returns to the process of obtaining an entry (S903).

검사결과 핫 데이터이면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블의 접근 횟수 필드를 가져오고(S911), 현재 시간을 기준으로 접근 횟수 필드를 변경한다(S912). 상기 메타데이터 서버는 핫 데이터의 검사 주기가 1일 단위 인지를 판단하여(S930), 검사 주기가 1일 단위이면, 상기 메타데이터 서버는 접근 횟수 필드의 day[0] 내지 day[검사주기] 값들을 합하여 합산 값을 접근 횟수(hit counter) 변수에 넣는다(S931). 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블의 엔트리의 파일의 현재 복사 제본 갯수를 얻고(S960), 접근 횟수를 현재 복사본 갯수로 나눈 값이 문턱값보다 큰지를 판단한다(S970). 판단결과 문턱값이 크지 않으면, 상기 메타데이터 서버는 엔트리의 확인 과정(S904)으로 넘어간다.If the check result is hot data, the metadata server fetches an access count field of the hot data management table (S911), and changes the access count field based on the current time (S912). The metadata server determines whether the inspection period of the hot data is in units of one day (S930), and if the inspection period is in units of one day, the metadata server determines the value of day [0] to day [inspection period] in the access count field. The sum is added to the sum counter variable (S931). The metadata server obtains the current copy binding number of the file of the entry of the hot data management table (S960), and determines whether the value obtained by dividing the number of accesses by the current copy number is greater than the threshold value (S970). If it is determined that the threshold value is not large, the metadata server proceeds to step S904 of entry confirmation.

판단 과정(S970)에서의 판단결과 문턱값이 더 크면, 상기 메타데이터 서버는 파일의 접근 횟수 값을 핫 데이터로 인식하고 파일의 복제본을 새로운 데이터 서버에 복제하고(S971), 파일 복제본의 갯수를 1만큼 증가시킨다(S972). 이어서, 상기 메타데이터 서버는 핫 데이터 관리 테이블의 엔트리의 추가 복제 유무 필드를 참(true)으로 갱신한다(S973).As a result of the determination in the determination process (S970), if the threshold value is larger, the metadata server recognizes the number of times of access of the file as hot data, replicates a copy of the file to a new data server (S971), and counts the number of file copies. Increase by 1 (S972). Subsequently, the metadata server updates an additional replication presence field of an entry of a hot data management table to true (S973).

판단 과정(S930)에서의 판단결과 1일 단위가 아니면, 상기 메타데이터 서버는 핫 데이터의 검사 주기가 시간 단위 인지를 판단한다(S940). 판단결과 시간 단위이면, 상기 메타데이터 서버는 접근 횟수 필드의 hour[0] 내지 hour[검사 주기] 값들을 합하여 합산 값을 접근 횟수(hit counter) 변수에 넣은 후(S941), 현재 복사 제본 갯수를 얻는 단계(S960)의 이후 과정을 수행한다.If it is determined that the determination process (S930) is not a daily unit, the metadata server determines whether the inspection period of the hot data is a unit of time (S940). If the determination result is a time unit, the metadata server adds the sum value of the hour [0] to hour [inspection period] values of the access count field to the hit counter variable (S941), and then sets the current copy binding number. After the obtaining step (S960) is performed.

판단 과정(S940)에서의 판단결과 검사 주기가 시간 단위가 아니면, 상기 메타데이터 서버는 핫 데이터의 검사 주기가 분 단위 인지를 판단한다(S950). 판단결과 분 단위이면, 상기 메타데이터 서버는 접근 횟수 필드의 min[0] 내지 min[검사주기] 값들을 합하여 합산 값을 접근 횟수(hit counter) 변수에 넣은 후(S951), 현재 복사 제본 갯수를 얻는 단계(S960) 이후 과정을 진행한다.As a result of the determination in operation S940, if the inspection period is not in units of time, the metadata server determines whether the inspection period of hot data is in units of minutes (S950). If the determination result is minutes, the metadata server adds the sum value of min [0] to min [inspection period] of the access count field into a hit counter variable (S951), and then sets the current copy binding number. The process proceeds after the obtaining step (S960).

판단 과정(S950)에서의 판단결과 검사 주기가 분 단위가 아니면, 상기 메타 데이터 서버는 핫 데이터의 검사 주기 값에 오류가 발생하였음을 출력한다(S980).As a result of the determination in operation S950, if the inspection period is not in minutes, the metadata server outputs that an error has occurred in the inspection period value of the hot data (S980).

도 10은 본 발명의 일실시예에 따른 파일의 추가 복제를 횟수하는 과정을 나타낸 흐름도이다. 이하에서, 메타데이터 서버는 도 1의 메타데이터 서버(120)와 동일한 서버이다.10 is a flowchart illustrating a process of counting additional copies of a file according to an embodiment of the present invention. Hereinafter, the metadata server is the same server as the metadata server 120 of FIG. 1.

도 10을 참조하면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블로부터 검사실행주기를 가져오고(S1001), 핫 데이터의 감지를 위해서 상기 핫 데이터 관리 테이블로부터 검사 주기 값을 가져온다(S1002). 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블로부터 최근 접근 시간이 현재 시간부터 검사 주기에 있지 않은 모든 엔트리들을 가져오고(S1003), 가져온 엔트리들을 하나씩 확인하여(S1004), 핫 데이터로 표시되어 있는지를 검사한다(S1010). 검사결과 핫 데이터가 아니면, 상기 메타데이터 서버는 검사실행주기만큼 슬립모드를 유지하고(S1050), 엔트리를 가져오는 과정(S1003)으로 되돌아간다.Referring to FIG. 10, the metadata server obtains a check execution period from the hot data management table (S1001), and obtains a check period value from the hot data management table (S1002) to detect hot data. The metadata server retrieves from the hot data management table all entries whose recent access time is not in the inspection period from the current time (S1003), checks the imported entries one by one (S1004), and checks whether it is marked as hot data. (S1010). If the check result is not hot data, the metadata server maintains the sleep mode for the check execution cycle (S1050), and returns to the process of obtaining an entry (S1003).

검사결과 핫 데이터이면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블에 추가 복제 필드가 설정되어 있는지를 판단한다(S1020).If the check result is hot data, the metadata server determines whether an additional replication field is set in the hot data management table (S1020).

판단 과정(S1020)에서의 판단결과 추가 복제 필드가 설정되어 있으면, 상기 메타데이터 서버는 엔트리의 파일의 복제본 수를 1만큼 감소시킨 후(S1021), 상기 데이터 서버에 저장된 복제본들 중 하나의 복제본을 삭제한다(S1022).As a result of the determination in S1020, if the additional replication field is set, the metadata server reduces the number of replicas of the entry file by 1 (S1021), and then copies one of the replicas stored in the data server. Delete (S1022).

메타데이터 서버는 엔트리의 파일 복제본 수가 소정의 기준 복제본 수와 같은지를 판단하여(S1030), 판단결과 복제본 수들이 같으면, 엔트리의 추가 복제 필드 값을 초기화하고(S1031), 엔트리 확인 과정(S1004)으로 돌아간다. 판단결과 복 제본 수들이 같지 않으면, 곧바로 엔트리 확인 과정(S1004)으로 넘어간다.The metadata server determines whether the number of file replicas of the entry is equal to the predetermined reference number of replicas (S1030). If the number of copies is the same as a result of the determination, the metadata server initializes an additional copy field value of the entry (S1031), and proceeds to the entry verification process (S1004). Go back. If it is determined that the replica numbers are not the same, the process proceeds directly to the entry confirmation process (S1004).

판단 과정(S1020)에서의 판단결과 추가 복제 필드가 설정되어 있지 않으면, 상기 메타데이터 서버는 상기 핫 데이터 관리 테이블로부터 엔트리를 제거하고(S1040), 엔트리 확인 과정(S1004)으로 돌아간다.If the additional replication field is not set as a result of the determination in S1020, the metadata server removes an entry from the hot data management table (S1040) and returns to an entry confirmation process (S1004).

즉, 단계(S1003)에서 엔트리를 가져올 때 현재시간으로부터 검사주기까지에 접근이 없었던 엔트리들을 가져오게 되므로, 이후의 과정(S1004 내지 S1040)에서 핫 데이터관련 엔트리라고 판단되면, 과거에는 핫 데이터였으나 현재는 핫 데이터가 아니라고 판단하여 전술한 과정을 수행하는 것이다.That is, when the entry is taken in step S1003, entries that have not been accessed from the current time to the inspection period are imported. Determines that it is not hot data and performs the above-described process.

즉, 도 10의 실시예를 통하여 과거에는 핫 데이터였으나 현재는 핫 데이터가 아닌 데이터를 파악하여 이에 관한 복제 데이터를 제거하여 스토리지의 낭비를 줄일 수 있다. That is, through the embodiment of FIG. 10, storage data may be reduced by identifying data that was hot data in the past but not hot data and removing duplicate data.

이상, 본 발명의 기술사상을 상기 바람직한 실시예와 첨부 도면을 참고하여 구체적으로 기술하였으나, 이는 예시에 불과한 것으로서 본 발명을 제한하거나 한정하는 것이 아님을 주의하여야 한다. 본 발명의 기술분야의 통상의 전문가라면 전술한 기재를 지득하여 본 발명의 기술사상의 범위에서 다양한 변형과 변경을 할 수 있을 것인바, 본 발명의 보호 범위는 이하의 특허청구범위의 기재에 의하여 정하여져야 할 것이다.As mentioned above, although the technical idea of this invention was described in detail with reference to the said preferable embodiment and an accompanying drawing, it should be noted that this is only an illustration and does not limit or limit this invention. Those skilled in the art of the present invention will be able to make various modifications and changes in the scope of the technical spirit of the present invention by acquiring the above-described description, the protection scope of the present invention by the description of the claims below It must be decided.

도 1은 본 발명의 실시예들이 적용되는 비대칭 스토리지 시스템의 구성도.1 is a block diagram of an asymmetric storage system to which embodiments of the present invention are applied.

도 2는 본 발명의 일실시예에 따른 데이터 접근 횟수 엔트리의 구성을 나타낸 도면.2 is a diagram illustrating a configuration of a data access number entry according to an embodiment of the present invention.

도 3은 본 발명의 일실시예에 따른 데이터 접근 횟수의 처리 방법을 나타낸 흐름도.3 is a flowchart illustrating a method of processing data access times according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 따른 데이터 접근 횟수의 순위 변경 방법을 나타낸 흐름도.4 is a flowchart illustrating a method of changing a rank of data access times according to an embodiment of the present invention.

도 5는 본 발명의 일실시예에 따른 데이터 접근 횟수의 전송 처리 방법을 나타낸 흐름도.5 is a flowchart illustrating a method of processing a data access count according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 따른 핫 데이터 관리 테이블의 구성을 나타낸 도면.6 is a diagram illustrating a configuration of a hot data management table according to an embodiment of the present invention.

도 7은 본 발명의 일실시예에 따른 데이터 접근 횟수 정보의 저장 방법을 나타낸 흐름도.7 is a flowchart illustrating a method of storing data access count information according to an embodiment of the present invention.

도 8은 본 발명의 일실시예에 따른 접근 횟수 필드의 갱신 방법을 나타낸 흐름도.8 is a flowchart illustrating a method of updating an access count field according to an embodiment of the present invention.

도 9는 본 발명의 일실시예에 따른 핫 데이터의 감지 및 복제 방법을 나타낸 흐름도.9 is a flowchart illustrating a method of detecting and duplicating hot data according to an embodiment of the present invention.

도 10은 본 발명의 일실시예에 따른 파일의 추가 복제를 횟수하는 과정을 나타낸 흐름도.10 is a flowchart illustrating a process of counting additional copies of a file according to an embodiment of the present invention.

Claims

Monitoring the number of accesses to data stored by each data server of the asymmetric storage system to maintain the latest access count information for each data;

Transmitting, by each data server, the access count information to a metadata server at predetermined intervals;

Performing data duplication or deletion of the replica according to the determination of the metadata server by each data server.

Hot data management method based on the number of accesses including.

The method of claim 1, wherein maintaining the latest state

Forming at least one data access number entry comprising a disk identifier field and a data identifier field for distinguishing data, an access count field for recording access counts, and a top list field;

Preparing a hash table including a predetermined number of hash heads for quick identification of the access number entry;

Managing access counts for each data using the data access number entry and the hash table;

Hot data management method based on the number of access that includes.

The method of claim 2, wherein the managing of the number of accesses for each data comprises:

Receiving a data read or delete request, inserting a data identifier into a hash function to obtain a result value, and then checking whether there is a data access number entry with the data identifier in the hash head;

Incrementing the access count field by one if a data access count entry exists and a read request;

Creating a new data access count entry if the data access count entry does not exist and is a read request;

If the data access number entry exists and is a delete request, removing the corresponding data access number entry

Hot data management method based on the number of access that includes.

The method of claim 3,

And changing the rank of the data access number entry when the number of data accesses of the data access entry is increased.

The method of claim 4, wherein changing the rank comprises:

Searching for a data access number entry having a higher priority than the data access number entry with an increased data access number;

Adjusting the rank of the data access count entry immediately after the searched data access number entry if found, and adjusting the data access number entry to the entry having the highest priority if not found.

Hot data management method based on the number of access that includes.

The method of claim 4, wherein changing the rank comprises:

Identifying a previous data access number entry from a top list of data access number entries with an increased data access count;

Making the previous data access count entry into the temporary entry;

Setting the data access count entry to the next entry of the temporary entry if the data access count of the temporary entry is greater than or equal to the data access count of the data access count entry;

If the data access number of the temporary entry is less than the data access number of the data access number entry, the previous entry on the top list of the temporary entry is updated with the temporary entry, and the data access number of the temporary entry is the data access number entry. Repeating this until it is greater than or equal to the number of times data is accessed

Hot data management method based on the number of accesses including.

4. The method of claim 3, wherein generating the new data access number entry

Creating a data structure comprising a disk identifier field and a data identifier field, an access count field for recording access counts, and a top list field for concatenating each data access count entry;

Putting a disc identifier and a data identifier for the data into the disc identifier field and the data identifier field;

Initializing the data access count field to '1';

Putting a hash result value of the data identifier into a hash head of one of the hash tables;

Determining the top list field value to have a lowest priority

Hot data management method based on the number of accesses including.

The method of claim 1, wherein the transmitting step

Transmitting information on the number of times of access to the data by the maximum number of transmissions set in advance in order of increasing number of times of access to the data;

Initializing the data access number entry;

Hot data management method based on the number of access that includes.

Configuring a hot data management table including an access count field for each data;

Collecting access count information for data stored by each data server from one or more data servers;

Updating the hot data management table according to the access count information;

Determining the hot data by checking the hot data management table at predetermined intervals;

Duplicating the data file determined to be hot data to a new data server;

Deleting a replica of the data file that is no longer hot data

Hot data management method based on the number of accesses including.

The method of claim 9, wherein updating the hot data management table

Obtaining a file identifier from a disk identifier and a data identifier included in the received access count information;

If an entry with the file identifier exists in the hot data management table, updating the entry based on the access count information;

Adding and initializing a new entry if there is no entry with the file identifier in the hot data management table

Hot data management method based on the number of access that includes.

The method of claim 10, wherein updating the corresponding entry comprises:

Obtaining the elapsed day, elapsed time, or elapsed time from the last update time to the present time point,

Min [0] to min [63], hour [0] to hour [23], or day [0] to day [364] arrays constituting the number of access fields, respectively, by the elapsed minutes, elapsed days, or elapsed days. Shifting to the right,

Increasing the min [0] value, hour [0] value, or day [0] value by the data access count value;

Updating the latest access time to the current time

Hot data management method based on the number of access that includes.

The method of claim 10, wherein adding and initializing the new entry comprises:

Initializing min [0] to min [63], hour [0] to hour [23], and day [0] to day [364] of the access number field of the new entry to '0';

Setting last_min, last_hour and last_day in the number of access fields of the new entry as the current time;

Putting the new entry in the hot data management table

Hot data management method based on the number of accesses including.

The method of claim 9, wherein determining whether the hot data is performed

Checking the inspection run cycle;

Calculating the number of accesses during the period in the inspection execution cycle based on the current time, for the entry updated in the inspection execution cycle based on the current time among the entries in the hot data management table;

Ascertaining the number of copies of the data file corresponding to the entry;

Checking whether the number of accesses divided by the number of replicas exceeds a predetermined threshold;

If exceeding, determining the data corresponding to the entry as hot data

Hot data management method based on the number of access that includes.

The method of claim 13, wherein said calculating

Obtaining the number of accesses by adding day [0] to day [inspection period] values of the access number field if the inspection execution cycle is a unit of one day;

Obtaining the number of accesses by adding hour [0] to hour [inspection period] values of the access number field if the inspection execution period is a unit of time;

Calculating the number of accesses by adding min [0] to min [inspection period] values of the access number field if the inspection execution period is in minutes;

Hot data management method based on the number of access that includes.

10. The method of claim 9, wherein said replicating is

Duplicating the data determined to be hot data to a new data server;

Increasing the number of file replicas by one;

Setting an additional copy existence field of the entry of the hot data management table to true

Hot data management method based on the number of access that includes.

The method of claim 9, wherein deleting the replica is

Identifying entries in the hot data management table that were hot data in the past but are not currently hot data;

Reducing the number of copies of the file corresponding to the identified entry by one; And

Steps to delete one of the existing replicas

Hot data management method based on the number of access that includes.

The method of claim 16,

Initializing additional duplicate field values of the entry if the number of replicas of the file is equal to a predetermined reference replica number;

Hot data management method based on the number of accesses further including.

The method of claim 16,

If an additional duplicate field is not set in the checked entry, removing the checked entry from the hot data management table

Data management method further comprising.

The method of claim 16, wherein the identifying step

Checking a value of a recent access time field of each entry in the hot data management table;

Selecting entries for which the identified last access time does not fall within the inspection period from the current time;

Hot data management method based on the number of access that includes.

Maintaining, by a plurality of data servers, the number of times data is accessed for data stored and managed by the plurality of data servers;

Transmitting, by the plurality of data servers, the number of data accesses to the stored data to the management server at each predetermined period;

Collecting and storing the number of data accesses transmitted by the management server;

The management server recognizes, as a corresponding hot data, data for which the number of times of access to each data server exceeds a preset threshold for each predetermined period and for each period, and the data of the hot file is one or more data servers of the plurality of data. Additional cloning steps

Hot data management method comprising a.