KR101830504B1

KR101830504B1 - In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment

Info

Publication number: KR101830504B1
Application number: KR1020160060908A
Authority: KR
Inventors: 안재훈; 손재기; 박창원
Original assignee: 전자부품연구원
Priority date: 2016-05-18
Filing date: 2016-05-18
Publication date: 2018-02-20
Also published as: WO2017200129A1; KR20170130178A

Abstract

A method and system for supporting in-memory DB connection for real-time analysis of big data based on a distributed environment are provided. A data processing method according to an embodiment of the present invention analyzes data based on a distributed environment using a distributed system and dynamically changes the structure of a distributed DB constituting a distributed system based on the distributed environment. Accordingly, the number of shards can be dynamically managed, thereby ensuring synchronism adaptively in a distributed environment, securing real-time performance through TMO-based scheduling, and ultimately improving speed / efficiency of big data analysis .

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an in-memory DB connection supporting scheduling method and system for real-

The present invention relates to a distributed computing technology, and more particularly, to a DBMS structure and a scheduling method for analyzing big data in real time based on a distributed environment.

There is a limit in that not only a remarkable performance degradation and ineffective management problem of a distributed system occur but also a fragmentary analysis by a predetermined structure is possible in searching and analyzing the big data stored in the DBMS.

Particularly, when the data is large, the occurrence rate of the response delay is high and the environment setting for application to the distributed environment becomes complicated.

Currently, stream data real-time processing engines such as Storm and Spark, which are provided in open source form, can not simultaneously support various types of data.

Furthermore, when adding new nodes to analyze big data based on a distributed environment using this open source, it is troublesome to newly set up a JVM (Java Virtual Machine) environment setting.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a big data processing method for dynamically managing the number of shards in order to secure concurrency and a BBMS to which the same is applied .

It is another object of the present invention to provide a big data processing method for performing time-triggered message-triggered object (TMO) -based scheduling as a method for ensuring real-time performance and a BBMS using the same.

According to an aspect of the present invention, there is provided a data processing method including: analyzing data based on a distributed environment using a distributed system; And dynamically changing the structure of the distributed DB constituting the distributed system based on the distributed environment.

In the changing step, at least one of the shards constituting the distributed DB may be removed or a new shard may be added.

The changing step may change the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.

The distributed DB can be linked to the in-memory DB.

The method may further include performing scheduling for analyzing data of distributed servers constituting the distributed system.

The scheduling may be performed by time-driven deadline scheduling and deadline scheduling through an event.

In addition, the data may be data in which at least one of the fixed data, the semi-fixed data, and the unstructured data is combined.

The scheduling step may perform the scheduling based on a time-triggered message-triggered object (TMO).

The performing of the scheduling may select a distribution server capable of processing scheduling in a deadline.

Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distributed server for analyzing data based on a distributed environment and dynamically changing the structure of the distributed DB based on the distributed environment.

According to another aspect of the present invention, there is provided a data processing method including: performing scheduling for data analysis of distributed servers constituting a distributed system based on time-triggered message-triggered object (TMO); And analyzing the data based on the distributed environment according to the scheduling result.

Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distribution server for performing scheduling for analyzing data stored in the distributed DB based on TMO and analyzing data based on a distributed environment based on a scheduling result.

As described above, according to the embodiments of the present invention, the number of shards can be dynamically managed to ensure synchronism adaptively in a distributed environment, and real-time performance can be ensured through TMO-based scheduling, Data analysis speed / efficiency can be improved.

1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention,
FIG. 2 is a diagram illustrating a DBMS according to another embodiment of the present invention, and FIG.
Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings.

1. Big Data Processing Method

1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention.

As shown in FIG. 1, a method of processing big data according to an embodiment of the present invention includes mapping source stream data to the distributed systems by scheduling, and filtering the distributed systems for data analysis.

As shown in FIG. 1, the big data processing method according to the embodiment of the present invention is applied to a distributed system environment having a structure using a plurality of shards in order to ensure concurrency.

The number of shards in a distributed system is dynamic. That is, it is free to add and remove the number of shards according to the needs of the distributed system environment.

Also, as shown in FIG. 1, the big data processing method according to an embodiment of the present invention uses time-triggered message-triggered object (TMO) -based scheduling for real-time processing.

As a result, it is possible to improve the throughput of the complex type and high-speed stream big data, maximize the query processing and analysis speed, and optimize the analysis result.

2. Dynamic Sharding Structure

In the distributed system for processing large data according to the embodiment of the present invention, a distributed DB is implemented with a plurality of shards in order to ensure concurrency. In Fig. 1, the number of shards is 10, which is merely an example. Of course, other numbers of shards can be implemented.

Furthermore, the number of shards may be implemented dynamically, or may be changed as needed. That is, the number of shards can be dynamically determined based on the number of distributed servers constituting the distributed system. Thereafter, when the number of distributed servers is changed, the number of shards can be added or reduced based on the number of distributed servers. For example, adding a shard if a distributed server (node) is added, and reducing the number of shards if the distributed server is removed.

In addition, the number of shards can be changed based on the major type of input big data. For example, if the input data is changed to unstructured data, the number of shards can be added.

It is also possible to dynamically change the number of shards based on the speed of the big data analysis in a distributed system. For example, if the analysis speed is slow, the number of shards can be increased, and if the analysis speed is fast, the number of shards can be reduced.

Through the dynamic sharding structure, it is possible to adaptively adapt to the change of the distributed environment, maximize the distributed processing performance, and optimally support the concurrency that can collect the processing results.

3. TMO-based scheduling

The distributed system for processing big data according to the embodiment of the present invention performs time-triggered message-triggered object (TMO) -based scheduling in order to secure real-time performance.

The source stream data stored in the distributed system includes the following: structured stream data (data that may be used for statistical analysis: regional demographics), semi-structured stream data (data not readily available for statistical analysis: newspaper articles) Stream data (raw data format: moving picture, picture, SNS text), and the like.

The distributed system for processing big data according to the embodiment of the present invention adopts TMO-based scheduling in order to process the various types of source stream data in a complex real-time manner.

Due to the TMO-based scheduling, when scheduling for analysis of big data is processed in a distributed system environment, it can be divided into two groups of periodically scheduled deadline scheduling and event-based deadline scheduling.

This allows you to select a distributed server that can handle the scheduling in the deadline, allowing all distributed servers to complete the analysis in the deadline without overlapping.

4. Distributed system

2 is a diagram illustrating a DBMS (Data Base Management System) according to another embodiment of the present invention. 2, a DBMS according to an exemplary embodiment of the present invention includes a network interface card (NIC) 110, a memory 120, a distributed server 130, and a distributed DB 140, as shown in FIG.

The NIC 110 is a communication means for accessing an external system and an external network and performing data communication.

The memory 120 is a volatile storage medium in which an in-memory DB is constructed and operated, and the distributed DB 140 is a large-capacity disk storage in which DB data is permanently stored and a restoration file and a duplicate file are stored. The distributed DB 140 is composed of a plurality of shards.

That is, the DBMS according to the embodiment of the present invention is a DBMS having a structure in which an in-memory DB and a distributed DB are linked.

The distributed server 130 constructs and operates an in-memory DB in the memory 110 and permanently stores the in-memory DB in the distributed DB 140. [ The distributed server 130 physically / logically comprises a plurality of servers.

Also, the distribution server 130 analyzes the data stored in the distributed DB 140, and performs scheduling for analysis based on TMO. The distributed server 130 dynamically configures / changes the number of shards constituting the distributed DB 140 according to the distributed environment.

A big data processing method using the DBMS shown in FIG. 2 is shown in FIG. Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.

As shown in FIG. 3, the distribution server 130 determines the number of shards based on the distributed environment and configures a distributed DB (S210). In addition, the distribution server 130 performs TMO-based scheduling (S220), and performs a big data analysis based on a distributed environment (S230).

Thereafter, when a change occurs in the distributed environment (S240-Y), the distribution server 130 changes the number of shards to maintain the concurrency of the big data analysis (S250).

It goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present invention may be embodied in computer-readable code form recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

110: Memory
120: NIC
130: CPU
140: HDD

Claims

Analyzing data based on a distributed environment using a distributed system; And
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
Wherein when the input data is changed from the fixed data to the unstructured data, a shard constituting the distributed DB is added.

delete

The method according to claim 1,
In the changing step,
And changes the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.

The method according to claim 1,
In the distributed DB,
In-memory DB.

The method according to claim 1,
And performing scheduling for data analysis of distributed servers constituting the distributed system.

The method of claim 5,
Wherein the performing of the scheduling comprises:
And performing deadline scheduling by time-driven deadline scheduling and deadline scheduling through an event.

The method of claim 5,
The data includes:
Wherein at least one of the fixed data, the semi-fixed data, and the unstructured data is a composite data.

The method of claim 7,
Wherein the performing of the scheduling comprises:
Wherein the scheduling is performed based on Time-triggered Message-triggered Object (TMO).

The method of claim 8,
Wherein the performing of the scheduling comprises:
And selects a distributed server capable of processing scheduling in a deadline.

Distributed DB; And
And a distributed server for analyzing data based on a distributed environment and dynamically changing a structure of the distributed DB based on the distributed environment,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.

Performing scheduling for analyzing data of distributed servers constituting a distributed system based on time-triggered message-triggered object (TMO);
Analyzing the data based on the distributed environment according to the scheduling result; And
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB if the input data is changed from the fixed data to the unstructured data.

Distributed DB; And
Scheduling for analysis of data stored in the distributed DB based on a time-triggered message-triggered object (TMO), analyzing data based on a distributed environment according to a scheduling result, And a distributed server that dynamically changes the content of the content,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.