KR101218712B1

KR101218712B1 - System and Method for data distributing

Info

Publication number: KR101218712B1
Application number: KR1020110040631A
Authority: KR
Inventors: 이상호; 심재영; 이재광
Original assignee: 주식회사 디케이아이테크놀로지
Priority date: 2011-04-29
Filing date: 2011-04-29
Publication date: 2013-01-09
Also published as: KR20120122462A

Abstract

본 발명은 데이터저장장치의 성능을 근거로 실시간으로 대용량 데이터를 분산 저장하도록 함으로써, 저용량의 데이터저장장치를 이용하여 효율적으로 시스템을 구축할 수 있도록 된 데이터분산시스템 및 그 데이터분산방법에 관한 것이다.
본 발명에 따른 데이터분산시스템 및 그 데이터분산방법은 SQL수행 및 데이터를 저장하기 위한 다수의 데이터저장수단과, 상기 각 데이터저장수단에 저장된 데이터에 대한 메타데이터 및 각 데이터저장수단에 대한 우선순위정보를 포함하는 메타데이터를 저장하기 위한 메타데이터저장수단 및, 상기 데이터저장수단의 성능을 근거로 우선순위를 산출하여 상기 메타데이터저장수단에 저장함과 더불어, 단말기로부터 인가되는 저장데이터를 분석하여 대용량 분산데이터인 경우 상기 메타데이터저장수단에 저장된 데이터저장수단의 우선순위에 따라 라운드-로빈 방식으로 다수의 데이터저장수단에 해당 데이터를 분산저장하는 데이터처리수단을 포함하여 구성되는 것을 특징으로 한다.The present invention relates to a data distribution system and a method for distributing data, by which a large amount of data can be distributed and stored in real time based on the performance of the data storage device, thereby efficiently constructing a system using a low capacity data storage device.
A data distribution system and a data distribution method according to the present invention include a plurality of data storage means for performing SQL and storing data, metadata about data stored in each data storage means, and priority information for each data storage means. Metadata storage means for storing the metadata including the; and calculating the priority based on the performance of the data storage means and storing the metadata in the metadata storage means, and analyzes the stored data applied from the terminal to distribute the mass In the case of data, the data storage means includes a data processing means for distributing and storing the data in a plurality of data storage means in a round-robin manner according to the priority of the data storage means stored in the metadata storage means.

Description

Data distribution system and its data distributing method {System and Method for data distributing}

본 발명은 데이터저장장치의 성능을 근거로 실시간으로 대용량 데이터를 분산 저장하도록 함으로써, 저용량의 데이터저장장치를 이용하여 효율적으로 시스템을 구축할 수 있도록 된 데이터분산시스템 및 그 데이터분산방법에 관한 것이다. The present invention relates to a data distribution system and a method for distributing data, by which a large amount of data can be distributed and stored in real time based on the performance of the data storage device, thereby efficiently constructing a system using a low capacity data storage device.

일반적으로 데이터분산시스템은 다수의 데이터베이스, 즉 저장수단을 구비하여 구성된다. 이때, 상기 저장수단은 해당 저장수단에 저장되는 데이터의 인덱스정보로 구성되는 메타데이터의 관리를 통해 운영되게 된다.In general, a data distribution system includes a plurality of databases, that is, storage means. At this time, the storage means is operated through the management of the metadata consisting of the index information of the data stored in the storage means.

최근 인터넷이 비약적으로 발전되면서 데이터분산시스템에 저장되는 데이터의 종류와 양도 방대지고 있는 추세이다.Recently, with the rapid development of the Internet, the type and amount of data stored in the data distribution system are also increasing.

이에 따라 데이터분산시스템을 구성하는 저장수단도 용량이 큰 저장수단으로 교체하여 이용되고 있는 실정이다. Accordingly, the storage means constituting the data distribution system is also used to replace the storage means with a large capacity.

특히, 일반적으로 데이터분산시스템에 있어서는 데이터를 분산 저장한 후 이분산저장된 데이터를 용이하게 조합하여 출력하는 것에 초점을 맞추도록 된 것으로, 네트워크 트래픽 데이터와 같은 대용량 데이터가 실시간으로 생성되는 경우 이를 신속하게 분산 처리하기 위해서는 고성능이면서 대용량의 저장수단이 요구된다. In particular, in a data distribution system, data distribution systems are generally focused on easily storing and distributing heterogeneous data stored therein, and rapidly generating large amounts of data such as network traffic data in real time. High performance and large storage means are required for distributed processing.

즉, 통신사업자는 인터넷서비스 제공을 위한 대용량 데이터의 처리를 신속하게 하기 위해 이전의 저용량 데이터저장수단은 폐기하고 일정 용량을 수용할 수 있는 고성능/대용량 저장수단으로 교체하여 데이터분산시스템을 새롭게 구축하게 된다. In other words, in order to expedite the processing of large amounts of data for the provision of Internet services, telecommunications operators should construct a new data distribution system by discarding previous low-capacity data storage means and replacing them with high-performance / high-capacity storage means that can accommodate a certain amount of capacity. do.

그러나, 데이터분산시스템은 주로 데이터를 분산저장하는 목적으로 사용되는 것으로, 데이터 저장용량 확보를 위해 대용량 저장장치로 교체하여 데이터분산시스템을 구축함에 있어서는 비용적인 손실이 있게 된다.However, the data distribution system is mainly used for the purpose of distributed storage of data, and there is a cost loss in constructing the data distribution system by replacing it with a mass storage device to secure data storage capacity.

이에, 본 발명에 있어서는 데이터저장수단의 성능을 근거로 우선순위를 설정하고, 이 우선순위에 따라 라운드-로빈 방식으로 입력되는 데이터를 분산 저장하도록 함으로써, 대용량 데이터에 대해서도 저용량 저장수단을 이용하여 용이하게 분산 저장할 수 있도록 해 주는 데이터분산시스템 및 그 데이터분산방법을 제공함에 기술적 목적이 있다.Accordingly, in the present invention, the priority is set based on the performance of the data storage means, and the data input in the round-robin manner is distributed and stored according to the priority, thereby easily using the low capacity storage means for the large data. It is a technical object to provide a data distribution system and a method for distributing data that allow distributed storage to be easily stored.

상기 목적을 달성하기 위한 본 발명의 제1 관점에 따른 데이터분산시스템은 SQL수행 및 데이터를 저장하기 위한 다수의 데이터저장수단과, 상기 각 데이터저장수단에 저장된 데이터에 대한 메타데이터 및 각 데이터저장수단에 대한 우선순위정보를 포함하는 메타데이터를 저장하기 위한 메타데이터저장수단 및, 상기 데이터저장수단의 성능을 근거로 우선순위를 산출하여 상기 메타데이터저장수단에 저장함과 더불어, 단말기로부터 인가되는 저장데이터를 분석하여 대용량 분산데이터인 경우 상기 메타데이터저장수단에 저장된 데이터저장수단의 우선순위에 따라 라운드-로빈 방식으로 다수의 데이터저장수단에 해당 데이터를 분산저장하는 데이터처리수단을 포함하여 구성되는 것을 특징으로 한다.A data distribution system according to the first aspect of the present invention for achieving the above object comprises a plurality of data storage means for performing SQL and storing data, metadata for the data stored in each of the data storage means and each data storage means. Metadata storage means for storing metadata including priority information on the data, and storing the data in the metadata storage means and calculating the priority based on the performance of the data storage means. In the case of a large amount of distributed data by analyzing the data storage means in accordance with the priority of the data storage means stored in the metadata storage means comprises a data processing means for distributing and storing the data in a plurality of data storage means in a round-robin manner It is done.

또한, 상기 메타데이터저장수단에 저장되는 데이터저장수단에 대한 메타데이터는 현재 로드량정보를 포함하는 것을 특징으로 한다.In addition, the metadata for the data storage means stored in the metadata storage means is characterized in that it includes the current load information.

또한, 상기 데이터저장수단의 우선순위정보는 데이터저장수단의 데이터처리속도와 저장용량 및 현재 로드량을 근거로 산출되는 것을 특징으로 한다.The priority information of the data storage means may be calculated based on the data processing speed, the storage capacity, and the current load of the data storage means.

또한, 상기 데이터저장수단의 우선순위정보는 각 데이터저장수단의 성능에 따라 초기우선순위값을 설정하되, 모든 데이터저장수단들의 초기우선순위값의 합은 100으로 설정하고, 최초에는 초기우선순위값과 현재 우선순위값을 동일하게 설정하며, 데이터저장시 데이터저장수단의 현재 우선순위값을 운용자에 의해 설정된 제1 기준값만큼 감소시키고, 현재 우선순위값이 가장 작은 데이터저장수단에서 SQL을 수행하되, SQL이 수행된 데이터저장수단의 현재 우선순위값에 운용자에 의해 설정된 제2 기준값을 합산함으로써 산출되는 것을 특징으로 한다.In addition, the priority information of the data storage means is set an initial priority value according to the performance of each data storage means, the sum of the initial priority value of all the data storage means is set to 100, initially the initial priority value And set the current priority value to be the same, and reduce the current priority value of the data storage means by the first reference value set by the operator during data storage, and execute SQL in the data storage means having the smallest current priority value, And calculating a second reference value set by the operator to a current priority value of the data storage means on which SQL is performed.

또한, 상기 데이터처리수단은 단말기로부터 인가되는 저장데이터를 분석하여 입력된 저장데이터가 분산데이터가 아니라고 판단되는 경우, 모든 데이터저장수단에 대해 SQL문을 수행하여 해당 저장데이터를 저장하는 것을 특징으로 한다.In addition, the data processing means analyzes the stored data applied from the terminal, and if it is determined that the input stored data is not distributed data, it is characterized in that to store the stored data by performing SQL statements for all data storage means. .

또한, 상기 데이터처리수단은 단말기로부터 인가되는 독출데이터를 분석하여 분산데이터인 경우 모든 데이터저장수단에서 SQL을 수행하여 분산 저장된 데이터를 독출하고, 독출된 데이터를 조합하여 해당 단말기로 제공하도록 함과 더불어, 단말기로부터 인가되는 독출데이터를 분석하여 분산데이터가 아닌 경우, 현재 로드량이 가장 적은 데이터저장수단을 통해 SQL을 수행하여 데이터를 독출한 후, 이를 해당 단말기로 제공하도록 구성되는 것을 특징으로 한다.In addition, the data processing means analyzes the read data applied from the terminal to read the distributed and stored data by performing SQL in all data storage means in the case of distributed data, and combines the read data and provides them to the corresponding terminal. When the read data applied from the terminal is not distributed data, the read data is performed by performing SQL through the data storage means having the least load amount, and then providing the read data to the corresponding terminal.

또한, 상기 데이터처리수단은 분산처리관련 데이터에 대한 삽입/삭제/검색 질의처리관련 각종 API(Application Programming Interface)를 제공하는 분산데이터 관리블럭과, 처리할 SQL문을 해석하여 분산 테이블정보가 포함되었는지를 확인하는 SQL파서(SQL Parser)와, 해석된 SQL 구문정보와 메타데이터저장수단에 저장된 각 데이터저장수단에 대한 메타데이터를 근거로 해당 SQL문이 처리될 데이터저장수단을 결정하는 쿼리 라우터(Query Router) 및, 상기 쿼리 라우터로부터 제공되는 데이터저장수단에 대한 우선순위 정보를 근거로 라운드-로빈 방식으로 데이터를 저장하거나 데이터를 독출할 데이터저장수단을 결정하는 분산 쿼리 처리기(Distributed Query Processor)를 포함하여 구성되는 분산데이터 처리블럭, 각 데이터저장수단별 현재 로드량을 포함하는 현재 상태정보를 확인하는 로드 발랜서(Load Balancer)와, 각 데이터저장수단별 데이터처리속도와 저장용량 및 상기 로드 발랜서로부터 제공되는 현재 로드량정보를 근거로 분산데이터 저장에 대한 우선순위값을 산출하는 분산관리부를 포함하여 구성되는 저장수단 관리블럭 및, 상기 저장수단 관리블럭으로부터 제공되는 데이터저장수단에 대한 우선순위정보 및 현재 로드량정보를 포함하는 메타데이터를 저장함과 더불어, 데이터저장수단에 저장된 데이터에 대한 메타데이터를 저장하는 메타데이터 처리블럭을 포함하여 구성되는 것을 특징으로 한다.In addition, the data processing means includes a distributed data management block that provides various application programming interfaces (APIs) related to insertion / deletion / search query processing for distributed processing-related data and distributed table information by analyzing SQL statements to be processed. SQL Parser to check the query, and query router to determine the data storage means for processing the corresponding SQL statement based on the parsed SQL syntax information and metadata about each data storage means stored in the metadata storage means. And a distributed query processor for determining data storage means for storing or reading data in a round-robin manner based on priority information on the data storage means provided from the query router. Distributed data processing block configured to include the current status information including the current load of each data storage means. The distributed management unit calculates a priority value for distributed data storage based on a load balancer, data processing speed and storage capacity of each data storage means, and current load information provided from the load balancer. A storage means management block configured to include and storing metadata including priority information and current load information of the data storage means provided from the storage means management block, and the data stored in the data storage means. It is characterized in that it comprises a metadata processing block for storing the metadata.

또한, 상기 목적을 달성하기 위한 본 발명의 제2 관점에 따른 데이터분산방법은 데이터처리수단에 다수의 데이터저장수단과 메타데이터저장수단이 결합되어 구성되는 데이터분산시스템에 있어서, 상기 각 데이터저장수단의 성능을 근거로 각 데이터저장수단에 대한 우선순위값을 산출하여 상기 메타데이터저장수단에 저장하는 제1 단계와, 단말기로부터 대용량의 저장데이터가 입력되면 해당 저장데이터를 다수개로 분할하는 제2 단계, 메타데이터저장수단에 저장된 데이터저장수단에 대한 우선순위값을 근거로 라운드-로빈 방식으로 상기 분할데이터를 분산저장하는 제3 단계를 포함하여 구성되는 것을 특징으로 한다.In addition, the data distribution method according to the second aspect of the present invention for achieving the above object is a data distribution system comprising a plurality of data storage means and metadata storage means combined with the data processing means, wherein each of the data storage means A first step of calculating a priority value for each data storage means based on the performance of the data storage means and storing the priority value in the metadata storage means, and dividing the corresponding stored data into a plurality of pieces when a large amount of stored data is input from the terminal; And distributing and storing the divided data in a round-robin manner based on the priority value of the data storing means stored in the metadata storing means.

또한, 상기 제1 단계는 데이터저장수단의 데이터처리속도와 저장용량 및 현재 로드량을 근거로 우선순위값을 산출하는 것을 특징으로 한다.In addition, the first step is characterized in that the priority value is calculated based on the data processing speed, the storage capacity and the current load of the data storage means.

또한, 상기 제1 단계는 각 데이터저장수단의 성능을 근거로 초기우선순위값을 설정하되, 모든 데이터저장수단들의 초기우선순위값의 합은 100으로 설정하는 제11 단계와, 최초에는 초기우선순위값과 현재 우선순위값을 동일하게 설정하는 제12 단계, 데이터저장시 해당 데이터저장수단의 현재 우선순위값을 운용자에 의해 설정된 제1 기준값만큼 감소시키는 제13단계 및, 현재 우선순위값이 가장 작은 데이터저장수단에서 SQL을 수행하되, SQL이 수행된 데이터저장수단의 현재 우선순위값에 운용자에 의해 설정된 제2 기준값을 합산하는 제14단계를 통해 우선순위값을 산출하는 것을 특징으로 한다.In addition, the first step is an eleventh step of setting the initial priority value based on the performance of each data storage means, the sum of the initial priority value of all the data storage means is set to 100, and initially the initial priority A twelfth step of setting the value equal to the current priority value; a thirteenth step of decreasing the current priority value of the corresponding data storage means by the first reference value set by the operator when storing data; and the smallest current priority value. Performing SQL in the data storage means, the priority value is calculated through the fourteenth step of adding the second reference value set by the operator to the current priority value of the data storage means performed SQL.

또한, 상기 제1 단계는 데이터저장수단의 현재 로드량을 상기 메타데이터저장수단에 추가로 저장하는 것을 특징으로 한다.The first step may further include storing the current load of the data storing means in the metadata storing means.

또한, 상기 제2 단계는 단말기로부터 입력된 저장데이터가 대용량 데이터가 아닌 경우 모든 데이터저장수단에 대해 SQL문을 수행하여 해당 저장데이터를 저장하는 제21단계를 추가로 포함하여 구성되는 것을 특징으로 한다.The second step may further include a twenty-first step of storing SQL data on all data storage means when the stored data input from the terminal is not a large amount of data. .

또한, 단말기로부터 대용량의 분산데이터에 대한 독출요구가 있게 되면 모든 데이터저장수단에서 SQL을 수행하여 분산 저장된 데이터를 독출하고, 독출된 데이터를 조합하여 해당 단말기로 제공하는 제4 단계를 추가로 포함하여 구성되는 것을 특징으로 한다.In addition, if there is a read request for a large amount of distributed data from the terminal further comprises a fourth step of performing SQL in all data storage means to read the distributed and stored data, and combine the read data to provide to the terminal; It is characterized in that the configuration.

본 발명에 의하면 데이터처리수단에서 데이터분산을 위한 주요처리를 수행하도록 하고 데이터저장수단에서는 SQL의 수행 및 저장처리만을 수행하도록 함과 더불어 데이터저장수단의 성능을 고려하여 대용량의 분할된 데이터를 분산 저장하도록 함으로써, 저용량의 데이터저장수단을 이용하여 대용량 분산데이터에 대한 처리를 용이하게 수행할 수 있게 된다.According to the present invention, the data processing means performs main processing for data distribution, and the data storage means performs only SQL execution and storage processing, and also stores and stores a large amount of divided data in consideration of the performance of the data storage means. By doing so, it is possible to easily process a large amount of distributed data using a low capacity data storage means.

도1은 본 발명에 따른 데이터분산시스템의 개략적인 구성을 나타낸 도면.
도2는 도1에 도시된 데이터처리수단(100)의 내부구성을 기능적으로 분리하여 나타낸 도면.
도3은 도2에 도시된 분산데이터 처리블럭(120)의 내부구성을 기능적으로 분리하여 나타낸 도면.
도4는 도2에 도시된 저장수단 관리블럭(130)의 내부구성을 기능적으로 분리하여 나타낸 도면.
도5는 도1에 도시된 데이터분산시스템의 데이터저장동작을 설명하기 위한 플로우챠트
도6은 도1에 도시된 데이터분산시스템의 데이터독출동작을 설명하기 위한 플로우챠트.1 is a view showing a schematic configuration of a data distribution system according to the present invention.
FIG. 2 is a view showing functionally separated internal structure of the data processing means 100 shown in FIG.
3 is a diagram showing functionally separated internal structure of the distributed data processing block 120 shown in FIG.
4 is a view showing functionally separated internal structure of the storage means management block 130 shown in FIG.
5 is a flowchart for explaining a data storage operation of the data distribution system shown in FIG.
FIG. 6 is a flowchart for explaining a data reading operation of the data distribution system shown in FIG.

이하, 도면을 참조하여 본 발명에 따른 실시예를 설명한다. 단, 이하에 설명하는 실시예는 본 발명의 하나의 바람직한 구현예를 예시적으로 나타낸 것으로서, 이러한 실시예의 예시는 본 발명의 권리범위를 제한하기 위한 것이 아니다. 본 발명은 그 기술적 사상을 벗어나지 않는 범위내에서 다양하게 변형시켜 실시할 수 있다.Hereinafter, embodiments according to the present invention will be described with reference to the drawings. However, the embodiments described below are illustrative of one preferred embodiment of the present invention, and examples of such embodiments are not intended to limit the scope of the present invention. The present invention can be variously modified without departing from the technical idea thereof.

도1은 본 발명에 따른 데이터분산시스템의 개략적인 구성을 나타낸 도면이다.1 is a view showing a schematic configuration of a data distribution system according to the present invention.

도1에 도시된 바와 같이 본 발명에 따른 데이터분산시스템은 데이터처리수단(100)과 다수의 데이터저장수단(200) 및 각 데이터저장수단(200)에 저장된 데이터에 대한 메타데이터와 각 데이터저장수단(200)에 대한 우선순위정보와 현재 로드량정보를 포함하는 메타데이터가 저장되는 메타데이터저장수단(300)이 LAN(1)을 통해 상호 통신결합되어 구성된다.As shown in FIG. 1, the data distribution system according to the present invention includes data processing means 100, a plurality of data storage means 200, metadata about data stored in each data storage means 200, and each data storage means. Metadata storage means 300, which stores metadata including priority information and current load information about 200, is stored in communication with each other via LAN 1.

이때, 상기 LAN(1)에는 다수의 단말기(400)가 결합되어 구성된다. 여기서, 상기 단말기(400)는 사용자컴퓨터 등과 같이 상기 데이터저장수단(200)에 접속하여 데이터를 저장하거나 데이터저장수단(200)에 저장된 데이터를 읽어들이는 각종 형태의 인터넷접속이 가능한 통신단말이 될 수 있다. 예컨대, 상기 단말기(400)는 인터넷접속이 가능한 이동통신단말기와 무선 결합되어 각종 데이터를 제공하는 통신단말이 될 수 있다. In this case, the plurality of terminals 400 are coupled to the LAN 1. Here, the terminal 400 is to be connected to the data storage means 200, such as a user computer to be a communication terminal capable of various forms of Internet connection for storing data or reading data stored in the data storage means 200. Can be. For example, the terminal 400 may be a communication terminal providing a variety of data by wirelessly coupled to a mobile communication terminal capable of internet access.

또한, 상기 데이터처리수단(100)은 데이터 분산처리관련 정보, 예컨대 우선순위 조건정보, 데이터저장수단(200)의 추가 및 삭제 등의 정보를 입력함과 더불어 데이터 분산처리관련 정보를 출력하기 위한 운용자단말기(500)를 구비하여 구성된다. In addition, the data processing means 100 inputs data distribution processing related information, such as priority condition information, addition and deletion of the data storage means 200, and also an operator for outputting data distribution processing related information. It is configured with a terminal (500).

또한, 상기 데이터처리수단(100)은 상기 단말기(400)로부터 제공되는 데이터의 메타데이터를 분석하여 다수의 데이터저장수단(200) 중 적합한 데이터저장수단(200)을 선택하여 저장한다. 이때, 상기 데이터처리수단(100)은 네트워크 트래픽 데이터와 같은 대용량의 데이터에 대해서는 다수의 데이터로 분할하여 이를 다수의 데이터저장수단(200)에 분산시켜 저장하게 된다. 여기서, 데이터처리수단(100)은 분할된 데이터를 저장할 데이터저장수단(200)을 선택함에 있어서 데이터저장수단(200)의 데이터처리속도와 저장용량 및 현재 로드량을 포함하는 상태정보를 근거로 데이터저장수단(200)에 대한 우선순위를 설정하고, 이 우선순위에 따른 라운드-로빈(Round-Robin) 방식으로 데이터를 분산저장하게 된다. In addition, the data processing means 100 analyzes the metadata of the data provided from the terminal 400 and selects and stores a suitable data storage means 200 among the plurality of data storage means 200. In this case, the data processing means 100 divides a large amount of data, such as network traffic data, into a plurality of data and stores the data in a plurality of data storage means 200. In this case, the data processing means 100 selects the data storage means 200 to store the divided data, and based on the state information including the data processing speed, the storage capacity, and the current load of the data storage means 200. Priority is set for the storage means 200, and the data are distributed and stored in a round-robin manner according to the priority.

또한, 상기 데이터처리수단(100)은 데이터저장수단(200)에 대한 우선순위값을 산출함에 있어서, 다음과 같은 방법이 적용될 수 있다. In addition, the data processing means 100 may calculate the priority value for the data storage means 200, the following method may be applied.

1. 모든 데이터저장수단(200)은 초기우선순위와 현재우선순위값을 갖는다.1. All data storage means 200 has an initial priority and a current priority value.

2. 모든 데이터저장수단(200)들의 초기우선순위값의 합은 100으로 설정한다.2. The sum of the initial priority values of all the data storage means 200 is set to 100.

3. 최초에는 초기우선순위값과 현재 우선순위값을 동일하게 설정한다.3. Initially, the initial priority value and the current priority value are set equal.

4. 데이터저장시 데이터저장수단(200)의 현재 우선순위값을 운용자에 의해 설정된 제1 기준값 예컨대, "100/데이터저장수단(200)개수"만큼 감소시킨다.4. During data storage, the current priority value of the data storage means 200 is reduced by a first reference value set by the operator, for example, "100 / number of data storage means 200".

5. 현재 우선순위값이 가장 작은 데이터저장수단(200)에서 SQL을 수행한다.5. The SQL is executed in the data storage means 200 having the smallest current priority value.

6. SQL이 수행된 데이터저장수단(200)의 현재 우선순위값에 운용자에 의해 설정된 제2 기준값을 합산한다.6. The second reference value set by the operator is added to the current priority value of the data storage means 200 in which SQL is performed.

여기서, 상기 초기우선순위값은 운용자에 의해 설정될 수 있으며, 이는 데이터저장수단(200)의 데이터처리속도 및 저장용량을 포함하는 성능에 따라 설정될 수 있다.Here, the initial priority value may be set by the operator, which may be set according to the performance including the data processing speed and the storage capacity of the data storage means 200.

또한, 상기 데이터처리수단(100)은 상기 단말기(400)로부터 독출 요구되는 메타데이터를 분석하여 데이터저장수단(200)으로부터 해당 데이터를 독출하여 단말기(400)로 제공한다. 이때, 데이터처리수단(100)은 분산 저장된 분할데이터에 대해서는 분할된 데이터를 원래의 대용량 데이터형태로 조합하여 단말기(400)로 제공한다.In addition, the data processing means 100 analyzes the metadata required to be read from the terminal 400, reads the corresponding data from the data storage means 200, and provides the same to the terminal 400. At this time, the data processing means 100 is provided to the terminal 400 by combining the divided data in the form of the original large-capacity data for the divided and stored divided data.

한편, 도2는 도1에 도시된 데이터처리수단(100)의 내부구성을 기능적으로 분리하여 나타낸 블록구성도이다.2 is a block diagram showing functional separation of the internal configuration of the data processing means 100 shown in FIG.

도2에 도시된 바와 같이 데이터처리수단(100)은 분산처리관련 데이터에 대한 삽입/삭제/검색 질의처리관련 각종 API(Application Programming Interface)를 제공하는 분산데이터 관리블럭(110)과, 데이터저장수단(200)의 메타데이터정보를 근거로 선택된 데이터저장수단(200)을 통해 SQL(Structured Query Language)문을 처리하여 데이터 분산저장 및 독출처리를 수행하는 분산데이터 처리블럭(120), 각 데이터저장수단(200)의 상태를 관리함과 더불어 우선순위값을 산출하는 저장수단 관리블럭(130), 메타데이터저장수단(300)을 업데이트시키는 등의 메타데이터저장수단(300)에 대한 전반적인 관리를 수행하는 메타데이터 처리블럭(140) 및, 데이터저장수단(200) 및 메타데이터저장수단(300)과 통신하기 위한 인터페이스블럭(150)을 포함하여 구성된다.As shown in FIG. 2, the data processing means 100 includes a distributed data management block 110 that provides various application programming interfaces (APIs) related to insertion / deletion / search query processing for distributed processing-related data, and data storage means. Distributed data processing block 120 for performing data distributed storage and read processing by processing SQL (Structured Query Language) statements through selected data storage means 200 based on metadata information of 200, and storing each data In addition to managing the state of the means 200 and to perform overall management of the metadata storage means 300, such as the storage means management block 130 for calculating the priority value, the metadata storage means 300, etc. It comprises a metadata processing block 140, and an interface block 150 for communicating with the data storage means 200 and the metadata storage means 300.

여기서, 상기 분산데이터 처리블럭(120)은 도3에 도시된 바와 같이 처리할 SQL문을 해석하여 분산 테이블정보가 포함되었는지를 확인하는 SQL파서(SQL Parser, 121)와, 해석된 SQL 구문정보와 메타데이터저장수단(300)에 저장된 각 데이터저장수단(200)에 대한 메타데이터를 근거로 해당 SQL문이 처리될 데이터저장수단(200)을 결정하는 쿼리 라우터(Query Router:QR,122), 상기 쿼리 라우터(122)로부터 제공되는 데이터저장수단(200)에 대한 우선순위 정보를 근거로 라운드-로빈 방식으로 데이터를 저장하거나 수정/검색, 즉 데이터를 독출할 데이터저장수단(200)을 결정하는 분산 쿼리 처리기(Distributed Query Processor:DQP, 123)를 포함하여 구성된다. 여기서, 상기 쿼리 라우터(122)는 상기 운용자단말기(500)로부터의 요구에 따라 데이터저장수단(200)에 대한 우선순위 조건정보를 변경설정할 수 있다. 또한, 각 데이터저장수단(200)에 대한 초기 우선순위값은 운용자에 의해 임의로 설정될 수 있다.Here, the distributed data processing block 120 interprets the SQL statement to be processed as shown in FIG. 3 to determine whether the distributed table information is included (SQL Parser 121), the parsed SQL syntax information and A query router (QR) 122 for determining the data storage means 200 to be processed the SQL statement based on the metadata for each data storage means 200 stored in the metadata storage means 300, Distributed to determine the data storage means 200 for storing or modifying / retrieving data, ie reading data, in a round-robin manner based on the priority information on the data storage means 200 provided from the query router 122. It is configured to include a distributed query processor (DQP, 123). Here, the query router 122 may change and set priority condition information for the data storage means 200 according to a request from the operator terminal 500. In addition, the initial priority value for each data storage means 200 may be arbitrarily set by the operator.

또한, 상기 저장수단 관리블럭(130)은 도4에 도시된 바와 같이 각 데이터저장수단(200)별 현재 로드량을 포함하는 현재 상태정보를 확인하는 로드 발랜서(Load Balancer, 131)와, 각 데이터저장수단(200)별 데이터처리속도와 저장용량 및 상기 로드 발랜서(131)로부터 제공되는 현재 로드량정보를 근거로 데이터 분산저장을 위한 데이터저장수단(200)의 우선순위값을 산출하는 분산관리부(132)를 포함하여 구성된다. 이때, 상기 분산관리부(132)는 현재 로드량정보 및 각 데이터저장수단(200)에 대한 우선순위정보를 메타데이터 처리블럭(140)을 통해 메타데이터저장수단(300)에 저장한다. 이때, 상기 분산관리부(132)은 데이터저장수단(200)의 현재 로드량이 기설정량 이상이라고 판단되는 경우 메타데이터 처리블럭(140)을 통해 메타데이터저장수단(300)에 저장된 해당 데이터저장수단(200)의 현재 로드량정보를 업데이트시키도록 요구하는 것도 가능하다. 또한, 상기 분산관리부(132)는 데이터저장수단(200)에 대한 데이터처리속도와 저장용량 대비 일정 비율 이상의 로드량이 확인되는 경우 메타데이터 처리블럭(140)을 통해 메타데이터저장수단(300)에 저장된 해당 데이터저장수단(200)의 현재 로드량정보를 업데이트시키도록 요구하는 것도 가능하다. 또한, 상기 분산관리부(132)는 하쉬-파티션(Hash-Partition) 기법을 이용하여 데이터저장수단(200)의 추가 등록 및 삭제 기능을 수행한다. In addition, the storage means management block 130 is a load balancer (131) for confirming the current status information including the current load amount for each data storage means 200, as shown in FIG. Distributed to calculate the priority value of the data storage means 200 for data distributed storage based on the data processing speed and storage capacity for each data storage means 200 and the current load information provided from the load balancer 131 It is configured to include a management unit 132. At this time, the distributed management unit 132 stores the current load information and the priority information for each data storage means 200 in the metadata storage means 300 through the metadata processing block 140. At this time, the distribution management unit 132, when it is determined that the current load amount of the data storage means 200 is more than the preset amount, the corresponding data storage means 200 stored in the metadata storage means 300 through the metadata processing block 140 It is also possible to request that the current load quantity information of the " In addition, the distributed management unit 132 is stored in the metadata storage means 300 through the metadata processing block 140 when the data processing speed for the data storage means 200 and the load amount more than a certain ratio compared to the storage capacity is confirmed. It is also possible to request to update the current load information of the data storage means 200. In addition, the distributed management unit 132 performs a function of additional registration and deletion of the data storage means 200 by using a hash-partition technique.

또한, 상기 메타데이터 처리블럭(140)은 상기 저장수단 관리블럭(130)으로부터 제공되는 데이터저장수단(200)에 대한 우선순위정보 및 현재 로드량정보를 근거로 메타데이터저장수단(300)을 업데이트시킴과 더불어, 분산데이터 처리블럭(120)으로부터 제공되는 저장데이터에 대한 메타데이터를 저장한다. 이때, 네트워크 트래픽 데이터와 같은 대용량 데이터에 대한 메타데이터는 분할 데이터에 대한 분산테이블정보가 포함된다. In addition, the metadata processing block 140 updates the metadata storing means 300 based on the priority information and the current load information on the data storing means 200 provided from the storing means managing block 130. In addition to storing the metadata for the stored data provided from the distributed data processing block 120. In this case, the metadata for the large amount of data such as the network traffic data includes distribution table information for the split data.

또한, 상기 인터페이스블럭(150)은 ORACLE, MASQL, MYSQL 등의 각종 형태의 데이터베이스 즉, 데이터저장수단(200)과 통신하기 위한 플랫폼이다.In addition, the interface block 150 is a platform for communicating with various types of databases, such as ORACLE, MASQL, MYSQL, that is, the data storage means 200.

이어, 상기한 구성으로 된 데이터분산시스템의 동작을 도5 및 도6에 도시된 플로우챠트를 참조하여 설명한다. 여기서, 도5는 데이터분산시스템의 데이터저장동작을 도시한 도면이고, 도6은 데이터분산시스템의 데이터독출동작을 도시한 도면이다.Next, the operation of the data distribution system having the above configuration will be described with reference to the flowcharts shown in Figs. 5 is a diagram showing a data storage operation of the data distribution system, and FIG. 6 is a diagram showing a data reading operation of the data distribution system.

1. 데이터저장(1. Data storage 도55 ))

먼저, 각 데이터저장수단(200)에 대한 데이터처리속도 및 데이터저장용량 등을 포함하는 성능정보와 우선순위 조건정보가 데이터처리수단(100)의 저장수단 관리블럭(130)에 등록된다. First, performance information and priority condition information including data processing speed and data storage capacity for each data storage means 200 are registered in the storage means management block 130 of the data processing means 100.

또한, 데이터처리수단(100)의 저장수단 관리블럭(130)은 실시간으로 각 데이터저장수단(200)의 현재 로드량을 확인함과 더불어 우선순위 조건정보를 근거로 각 데이터저장수단(200)에 대한 우선순위값을 산출하여 이를 메타데이터 처리블럭(140)을 통해 메타데이터저장수단(300)에 등록한다.In addition, the storage means management block 130 of the data processing means 100 checks the current load amount of each data storage means 200 in real time as well as to each data storage means 200 based on priority condition information. The priority value is calculated and registered in the metadata storing means 300 through the metadata processing block 140.

상술한 바와 같이 현재 로드량에 따라 각 데이터저장수단(200)에 대한 우선순위정보가 변동되는 상태에서, 단말기(400)로부터 임의 데이터가 데이터처리수단(100)으로 인가되면(ST1), 데이터처리수단(100)은 입력된 데이터의 SQL문을 분석하여 저장데이터인지를 확인한다(ST2). 이때, 상기 SQL문의 분석은 분산데이터 처리블럭(120)의 SQL 파서(121)에 의해 행해진다. As described above, when arbitrary data is applied from the terminal 400 to the data processing means 100 in a state in which the priority information for each data storage means 200 varies according to the current load amount (ST1), data processing The means 100 analyzes the SQL statement of the input data and checks whether it is stored data (ST2). At this time, the analysis of the SQL statement is performed by the SQL parser 121 of the distributed data processing block 120.

상기 ST2단계에서 입력된 데이터가 저장데이터라고 판단되는 경우 데이터처리수단(100)은 SQL문의 해석을 통해 해당 저장데이터가 대용량의 분산데이터인지를 확인한다(ST3). 이때, 상기 SQL문의 해석은 분산데이터 처리블럭(120)의 SQL 파서(121)에 의해 행해진다. If it is determined that the data input in step ST2 is stored data, the data processing means 100 checks whether the corresponding stored data is a large amount of distributed data by analyzing the SQL statement (ST3). At this time, the SQL statement is interpreted by the SQL parser 121 of the distributed data processing block 120.

상기 ST3단계에서 입력된 저장데이터가 분산데이터라고 판단되는 경우, 데이터처리수단(100)은 각 데이터저장수단(200)의 우선순위 기반 라운드-로빈방식으로 선정된 데이터저장수단(200)을 통해 SQL문을 수행하고 해당 저장데이터를 분산저장한다(ST4). 이때, 저장데이터의 데이터 분할은 일정 크기단위로 균등하게 분할되거나 데이터저장수단(200)의 로드에 따라 자동 또는 수동으로 조정이 가능하다. 또한, 상기 데이터처리수단(100)은 메타데이터저장수단(300)에 등록된 각 데이터저장수단(200)에 대한 우선순위정보를 근거로 라운드-로빈 방식으로 분할된 데이터를 저장한다. 여기서, 상기 분할된 데이터에 대한 분산정보는 메타데이터저장수단(300)에 저장된다. 또한, 우선순위정보를 근거로 한 데이터저장수단(200)의 선택은 분산데이터 처리블럭(120)의 쿼리라우터(122)에서 행해지며, 각 데이터저장수단(200)으로 분할된 데이터를 저장하는 처리는 분산 쿼리 처리기(123)에서 행해진다.When it is determined that the stored data input in step ST3 is distributed data, the data processing means 100 uses SQL through the data storage means 200 selected by the priority-based round-robin method of each data storage means 200. The statement is executed and the stored data are distributed and stored (ST4). At this time, the data division of the stored data may be equally divided into predetermined size units or may be automatically or manually adjusted according to the load of the data storage means 200. In addition, the data processing means 100 stores the divided data in a round-robin manner based on the priority information for each data storage means 200 registered in the metadata storage means 300. In this case, the distributed information on the divided data is stored in the metadata storage means 300. Further, the selection of the data storage means 200 based on the priority information is performed in the query router 122 of the distributed data processing block 120, and the process of storing the data divided into the respective data storage means 200. Is done in distributed query processor 123.

한편, 상기 ST3단계에서 입력된 저장데이터가 분산데이터가 아니라고 판단되는 경우, 데이터처리수단(100)은 모든 데이터저장수단(200)에 대해 SQL문을 수행하여 모든 데이터저장수단(200)에 해당 저장데이터를 저장한다(ST5). 이때, 분산데이터가 아닌 데이터의 데이터저장수단(200)의 설정 조건은 운용자에 의해 임의로 설정될 수 있다. On the other hand, when it is determined that the stored data input in step ST3 is not distributed data, the data processing means 100 performs SQL statements for all data storage means 200 to store the data in all data storage means 200. Save the data (ST5). At this time, the setting conditions of the data storage means 200 of the data other than the distributed data may be arbitrarily set by the operator.

2. 데이터독출(2. Read data ( 도6Figure 6 ))

먼저, 상술한 바와 같이 대용량 데이터에 대한 분할 데이터가 우선순위 기반 라운드-로빈 방식으로 다수의 데이터저장수단(200)에 분산 저장된 상태에서, 단말기(400)로부터 임의 데이터가 데이터처리수단(100)으로 인가되면(ST11), 데이터처리수단(100)은 입력된 데이터의 SQL문을 분석하여 독출데이터인지를 확인한다(ST12). 이때, 독출데이터는 예컨대, 검색데이터가 될 수 있으며, 상기 SQL문의 분석은 분산데이터 처리블럭(120)의 SQL파서(121)에서 행해진다.First, as described above, in a state in which divided data for a large amount of data is distributed and stored in a plurality of data storage means 200 in a priority-based round-robin manner, arbitrary data from the terminal 400 is transferred to the data processing means 100. If it is applied (ST11), the data processing means 100 analyzes the SQL statement of the input data and checks whether it is read data (ST12). In this case, the read data may be, for example, search data, and the analysis of the SQL statement is performed by the SQL parser 121 of the distributed data processing block 120.

상기 ST12단계에서 입력된 데이터가 독출데이터라고 판단되는 경우 데이터처리수단(100)은 SQL문의 해석을 통해 해당 독출데이터에 분산테이블이 존재하는지를 확인한다(ST13). 이때, 상기 SQL문의 해석은 분산데이터 처리블럭(120)의 SQL파서(121)에서 행해진다.If it is determined that the data input in step ST12 is read data, the data processing means 100 checks whether a distribution table exists in the read data by analyzing the SQL statement (ST13). At this time, the SQL statement is interpreted by the SQL parser 121 of the distributed data processing block 120.

상기 ST13단계에서 입력된 독출데이터가 분산테이블을 갖는 분산데이터라고 판단되는 경우, 데이터처리수단(100)은 모든 데이터저장수단(200)에 대해 SQL문을 수행하여 해당 분산데이터가 존재하는 데이터저장수단(200)을 검색하고, 검색된 다수의 데이터저장수단(200)에서 해당 분산데이터를 독출한 후 이를 조합하여 해당 단말기(400)로 제공한다(ST14). When it is determined that the read data input in step ST13 is distributed data having a distribution table, the data processing means 100 performs an SQL statement for all data storage means 200 to store the data in which the corresponding distributed data exists. Search 200, read the distributed data from the plurality of searched data storage means 200 and combine them and provide them to the corresponding terminal 400 (ST14).

한편, 상기 ST13단계에서 입력된 독출데이터가 분산테이블을 갖지 않는 데이터라고 판단되는 경우, 데이터처리수단(100)은 메타데이터저장수단(300)을 검색하여 현재 로드량이 가장 작은 데이터저장수단(200)을 통해 SQL문을 수행하도록 하고, 이에 의해 검색된 데이터저장수단(200)에서 해당 데이터를 독출하여 해당 단말기(400)로 제공한다(ST15). On the other hand, when it is determined that the read data input in step ST13 does not have a distribution table, the data processing means 100 searches the metadata storage means 300 to search for the data load means 200 having the smallest current load. Through the SQL statement to be performed, the corresponding data is read by the retrieved data storage means 200 and provided to the corresponding terminal 400 (ST15).

즉, 상기 실시예에 의하면 데이터처리수단에서 데이터분산처리에 대한 주요처리를 수행하도록 하고 데이터저장수단에서는 SQL의 수행 및 저장처리만을 수행하도록 함과 더불어 데이터저장수단의 성능과 현재 로드량을 고려하여 대용량 데이터에 대한 분할데이터를 분산 저장하도록 함으로써, 저성능/저용량의 데이터저장수단을 이용하여 대용량 분산데이터에 대한 처리를 용이하게 수행할 수 있게 된다. That is, according to the above embodiment, the data processing means performs the main processing for the data distribution processing, and the data storage means performs only the SQL execution and the storage processing, and in consideration of the performance and the current load of the data storage means. By distributing and storing the divided data for the large data, it is possible to easily perform the processing for the large-scale distributed data using a low performance / low capacity data storage means.

100 : 데이터처리수단, 110 : 분산데이터 관리블럭,
120 : 분산데이터 처리블럭, 130 : 저장수단 관리블럭,
140 : 메타데이터 처리블럭, 150 : 인터페이스 블럭,
200 : 데이터저장수단, 300 : 메타데이터저장수단,
400 : 단말기, 500 : 운용자단말기.100: data processing means, 110: distributed data management block,
120: distributed data processing block, 130: storage means management block,
140: metadata processing block, 150: interface block,
200: data storage means, 300: metadata storage means,
400: terminal, 500: operator terminal.

Claims

A plurality of data storage means for performing SQL and storing data;
Metadata storage means for storing metadata including metadata about data stored in each data storage means and priority information for each data storage means;
Priority is calculated based on the performance of the data storage means and stored in the metadata storage means, and the stored data applied from the terminal is analyzed and the data storage means stored in the metadata storage means in the case of a large amount of distributed data. It comprises a data processing means for distributing and storing the data in a plurality of data storage means in a round-robin manner according to the ranking,
Priority information of the data storage means is set an initial priority value according to the performance of each data storage means, the sum of the initial priority value of all the data storage means is set to 100, initially the initial priority value and the current Set the priority value equally, and reduce the current priority value of the data storage means by the first reference value set by the operator when storing data, and execute SQL in the data storage means having the smallest current priority value. And calculating a second reference value set by the operator to a current priority value of the data storage means performed.

The method of claim 1,
And metadata about the data storage means stored in the metadata storage means includes current load information.

The method of claim 1,
Priority information of the data storage means is calculated based on the data processing speed and storage capacity of the data storage means and the current load amount.

delete

The method of claim 1,
If the data processing means analyzes the stored data applied from the terminal and determines that the input stored data is not distributed data, the data processing means performs SQL statements for all data storage means and stores the stored data. system.

The method of claim 1,
The data processing means analyzes the read data applied from the terminal, in the case of distributed data, executes SQL in all data storage means to read the distributed and stored data, combines the read data, and provides the read data to the corresponding terminal.
In the case of non-distributed data by analyzing the read data applied from the terminal, the data distribution system is configured to read data by performing SQL through the data storage means having the least load amount and then provide the data to the corresponding terminal. .

The method of claim 1,
The data processing means includes a distributed data management block for providing various application programming interfaces (APIs) related to insertion / deletion / search query processing for distributed processing related data;
SQL parser that checks whether distributed table information is included by analyzing SQL statement to be processed, and the corresponding SQL statement based on the parsed SQL syntax information and metadata about each data storage means stored in metadata storage means. A query router for determining the data storage means to be processed, and a data storage means for storing or reading data in a round-robin manner based on priority information on the data storage means provided from the query router. Distributed data processing block including a distributed query processor for determining the
Load balancer to check the current status information including the current load amount of each data storage means, and data processing speed and storage capacity of each data storage means and the current load information provided from the load balancer A storage means management block comprising a distributed management unit for calculating a priority value for storing distributed data on the basis of;
And a metadata processing block for storing metadata about the data stored in the data storage means, in addition to storing metadata including priority information and current load information of the data storage means provided from the storage means management block. Data distribution system, characterized in that configured to.

In the data distribution method of the data distribution system is configured by combining a plurality of data storage means and metadata storage means in the data processing means,
A first step of calculating a priority value for each data storage means based on the performance of each data storage means and storing it in the metadata storage means;
A second step of dividing the corresponding stored data into a plurality of pieces when a large amount of stored data is inputted from the terminal;
And a third step of distributing and storing the divided data in a round-robin manner based on the priority value for the data storing means stored in the metadata storing means,
The first step may include setting an initial priority value according to the performance of each data storage means, and setting the sum of the initial priority values of all data storage means to 100;
A twelfth step of initially setting the initial priority value and the current priority value equally;
A thirteenth step of reducing the current priority value of the corresponding data storage means by the first reference value set by the operator during data storage;
The SQL is executed in the data storage means having the smallest current priority value, and the priority value is calculated through step 14 of adding the second reference value set by the operator to the current priority value of the data storage means in which SQL is performed. Data distribution method, characterized in that.

9. The method of claim 8,
And the first step calculates a priority value based on the data processing speed, the storage capacity, and the current load of the data storage means.

delete

10. The method according to claim 8 or 9,
The first step further comprises storing the current load of data storage means in the metadata storage means.

9. The method of claim 8,
The second step may further include a twenty-first step of storing the stored data by performing SQL statements for all data storage means when the stored data input from the terminal is not a large amount of data. Way.

9. The method of claim 8,
If there is a request for reading a large amount of distributed data from the terminal, and performing a SQL in all data storage means to read the distributed and stored data, and further comprising a fourth step of combining the read data to provide to the terminal Data distribution method, characterized in that.