KR101830504B1 - In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment - Google Patents

In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment Download PDF

Info

Publication number
KR101830504B1
KR101830504B1 KR1020160060908A KR20160060908A KR101830504B1 KR 101830504 B1 KR101830504 B1 KR 101830504B1 KR 1020160060908 A KR1020160060908 A KR 1020160060908A KR 20160060908 A KR20160060908 A KR 20160060908A KR 101830504 B1 KR101830504 B1 KR 101830504B1
Authority
KR
South Korea
Prior art keywords
distributed
data
constituting
scheduling
shards
Prior art date
Application number
KR1020160060908A
Other languages
Korean (ko)
Other versions
KR20170130178A (en
Inventor
안재훈
손재기
박창원
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Priority to KR1020160060908A priority Critical patent/KR101830504B1/en
Priority to PCT/KR2016/005418 priority patent/WO2017200129A1/en
Priority to US15/464,798 priority patent/US10671636B2/en
Publication of KR20170130178A publication Critical patent/KR20170130178A/en
Application granted granted Critical
Publication of KR101830504B1 publication Critical patent/KR101830504B1/en

Links

Images

Classifications

    • G06F17/30545
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/40Data acquisition and logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30194
    • G06F17/30318
    • G06F17/30595

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)

Abstract

A method and system for supporting in-memory DB connection for real-time analysis of big data based on a distributed environment are provided. A data processing method according to an embodiment of the present invention analyzes data based on a distributed environment using a distributed system and dynamically changes the structure of a distributed DB constituting a distributed system based on the distributed environment. Accordingly, the number of shards can be dynamically managed, thereby ensuring synchronism adaptively in a distributed environment, securing real-time performance through TMO-based scheduling, and ultimately improving speed / efficiency of big data analysis .

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an in-memory DB connection supporting scheduling method and system for real-

The present invention relates to a distributed computing technology, and more particularly, to a DBMS structure and a scheduling method for analyzing big data in real time based on a distributed environment.

There is a limit in that not only a remarkable performance degradation and ineffective management problem of a distributed system occur but also a fragmentary analysis by a predetermined structure is possible in searching and analyzing the big data stored in the DBMS.

Particularly, when the data is large, the occurrence rate of the response delay is high and the environment setting for application to the distributed environment becomes complicated.

Currently, stream data real-time processing engines such as Storm and Spark, which are provided in open source form, can not simultaneously support various types of data.

Furthermore, when adding new nodes to analyze big data based on a distributed environment using this open source, it is troublesome to newly set up a JVM (Java Virtual Machine) environment setting.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a big data processing method for dynamically managing the number of shards in order to secure concurrency and a BBMS to which the same is applied .

It is another object of the present invention to provide a big data processing method for performing time-triggered message-triggered object (TMO) -based scheduling as a method for ensuring real-time performance and a BBMS using the same.

According to an aspect of the present invention, there is provided a data processing method including: analyzing data based on a distributed environment using a distributed system; And dynamically changing the structure of the distributed DB constituting the distributed system based on the distributed environment.

In the changing step, at least one of the shards constituting the distributed DB may be removed or a new shard may be added.

The changing step may change the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.

The distributed DB can be linked to the in-memory DB.

The method may further include performing scheduling for analyzing data of distributed servers constituting the distributed system.

The scheduling may be performed by time-driven deadline scheduling and deadline scheduling through an event.

In addition, the data may be data in which at least one of the fixed data, the semi-fixed data, and the unstructured data is combined.

The scheduling step may perform the scheduling based on a time-triggered message-triggered object (TMO).

The performing of the scheduling may select a distribution server capable of processing scheduling in a deadline.

Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distributed server for analyzing data based on a distributed environment and dynamically changing the structure of the distributed DB based on the distributed environment.

According to another aspect of the present invention, there is provided a data processing method including: performing scheduling for data analysis of distributed servers constituting a distributed system based on time-triggered message-triggered object (TMO); And analyzing the data based on the distributed environment according to the scheduling result.

Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distribution server for performing scheduling for analyzing data stored in the distributed DB based on TMO and analyzing data based on a distributed environment based on a scheduling result.

As described above, according to the embodiments of the present invention, the number of shards can be dynamically managed to ensure synchronism adaptively in a distributed environment, and real-time performance can be ensured through TMO-based scheduling, Data analysis speed / efficiency can be improved.

1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention,
FIG. 2 is a diagram illustrating a DBMS according to another embodiment of the present invention, and FIG.
Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.

Hereinafter, the present invention will be described in detail with reference to the drawings.

1. Big Data Processing Method

1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention.

As shown in FIG. 1, a method of processing big data according to an embodiment of the present invention includes mapping source stream data to the distributed systems by scheduling, and filtering the distributed systems for data analysis.

As shown in FIG. 1, the big data processing method according to the embodiment of the present invention is applied to a distributed system environment having a structure using a plurality of shards in order to ensure concurrency.

The number of shards in a distributed system is dynamic. That is, it is free to add and remove the number of shards according to the needs of the distributed system environment.

Also, as shown in FIG. 1, the big data processing method according to an embodiment of the present invention uses time-triggered message-triggered object (TMO) -based scheduling for real-time processing.

As a result, it is possible to improve the throughput of the complex type and high-speed stream big data, maximize the query processing and analysis speed, and optimize the analysis result.

2. Dynamic Sharding Structure

In the distributed system for processing large data according to the embodiment of the present invention, a distributed DB is implemented with a plurality of shards in order to ensure concurrency. In Fig. 1, the number of shards is 10, which is merely an example. Of course, other numbers of shards can be implemented.

Furthermore, the number of shards may be implemented dynamically, or may be changed as needed. That is, the number of shards can be dynamically determined based on the number of distributed servers constituting the distributed system. Thereafter, when the number of distributed servers is changed, the number of shards can be added or reduced based on the number of distributed servers. For example, adding a shard if a distributed server (node) is added, and reducing the number of shards if the distributed server is removed.

In addition, the number of shards can be changed based on the major type of input big data. For example, if the input data is changed to unstructured data, the number of shards can be added.

It is also possible to dynamically change the number of shards based on the speed of the big data analysis in a distributed system. For example, if the analysis speed is slow, the number of shards can be increased, and if the analysis speed is fast, the number of shards can be reduced.

Through the dynamic sharding structure, it is possible to adaptively adapt to the change of the distributed environment, maximize the distributed processing performance, and optimally support the concurrency that can collect the processing results.

3. TMO-based scheduling

The distributed system for processing big data according to the embodiment of the present invention performs time-triggered message-triggered object (TMO) -based scheduling in order to secure real-time performance.

The source stream data stored in the distributed system includes the following: structured stream data (data that may be used for statistical analysis: regional demographics), semi-structured stream data (data not readily available for statistical analysis: newspaper articles) Stream data (raw data format: moving picture, picture, SNS text), and the like.

The distributed system for processing big data according to the embodiment of the present invention adopts TMO-based scheduling in order to process the various types of source stream data in a complex real-time manner.

Due to the TMO-based scheduling, when scheduling for analysis of big data is processed in a distributed system environment, it can be divided into two groups of periodically scheduled deadline scheduling and event-based deadline scheduling.

This allows you to select a distributed server that can handle the scheduling in the deadline, allowing all distributed servers to complete the analysis in the deadline without overlapping.

4. Distributed system

2 is a diagram illustrating a DBMS (Data Base Management System) according to another embodiment of the present invention. 2, a DBMS according to an exemplary embodiment of the present invention includes a network interface card (NIC) 110, a memory 120, a distributed server 130, and a distributed DB 140, as shown in FIG.

The NIC 110 is a communication means for accessing an external system and an external network and performing data communication.

The memory 120 is a volatile storage medium in which an in-memory DB is constructed and operated, and the distributed DB 140 is a large-capacity disk storage in which DB data is permanently stored and a restoration file and a duplicate file are stored. The distributed DB 140 is composed of a plurality of shards.

That is, the DBMS according to the embodiment of the present invention is a DBMS having a structure in which an in-memory DB and a distributed DB are linked.

The distributed server 130 constructs and operates an in-memory DB in the memory 110 and permanently stores the in-memory DB in the distributed DB 140. [ The distributed server 130 physically / logically comprises a plurality of servers.

Also, the distribution server 130 analyzes the data stored in the distributed DB 140, and performs scheduling for analysis based on TMO. The distributed server 130 dynamically configures / changes the number of shards constituting the distributed DB 140 according to the distributed environment.

A big data processing method using the DBMS shown in FIG. 2 is shown in FIG. Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.

As shown in FIG. 3, the distribution server 130 determines the number of shards based on the distributed environment and configures a distributed DB (S210). In addition, the distribution server 130 performs TMO-based scheduling (S220), and performs a big data analysis based on a distributed environment (S230).

Thereafter, when a change occurs in the distributed environment (S240-Y), the distribution server 130 changes the number of shards to maintain the concurrency of the big data analysis (S250).

It goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present invention may be embodied in computer-readable code form recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

110: Memory
120: NIC
130: CPU
140: HDD

Claims (12)

Analyzing data based on a distributed environment using a distributed system; And
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
Wherein when the input data is changed from the fixed data to the unstructured data, a shard constituting the distributed DB is added.
delete The method according to claim 1,
In the changing step,
And changes the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.
The method according to claim 1,
In the distributed DB,
In-memory DB.
The method according to claim 1,
And performing scheduling for data analysis of distributed servers constituting the distributed system.
The method of claim 5,
Wherein the performing of the scheduling comprises:
And performing deadline scheduling by time-driven deadline scheduling and deadline scheduling through an event.
The method of claim 5,
The data includes:
Wherein at least one of the fixed data, the semi-fixed data, and the unstructured data is a composite data.
The method of claim 7,
Wherein the performing of the scheduling comprises:
Wherein the scheduling is performed based on Time-triggered Message-triggered Object (TMO).
The method of claim 8,
Wherein the performing of the scheduling comprises:
And selects a distributed server capable of processing scheduling in a deadline.
Distributed DB; And
And a distributed server for analyzing data based on a distributed environment and dynamically changing a structure of the distributed DB based on the distributed environment,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.
Performing scheduling for analyzing data of distributed servers constituting a distributed system based on time-triggered message-triggered object (TMO);
Analyzing the data based on the distributed environment according to the scheduling result; And
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB if the input data is changed from the fixed data to the unstructured data.
Distributed DB; And
Scheduling for analysis of data stored in the distributed DB based on a time-triggered message-triggered object (TMO), analyzing data based on a distributed environment according to a scheduling result, And a distributed server that dynamically changes the content of the content,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.
KR1020160060908A 2016-05-18 2016-05-18 In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment KR101830504B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020160060908A KR101830504B1 (en) 2016-05-18 2016-05-18 In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment
PCT/KR2016/005418 WO2017200129A1 (en) 2016-05-18 2016-05-23 Method and system for processing high-speed stream big data on basis of in-memory db
US15/464,798 US10671636B2 (en) 2016-05-18 2017-03-21 In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160060908A KR101830504B1 (en) 2016-05-18 2016-05-18 In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment

Publications (2)

Publication Number Publication Date
KR20170130178A KR20170130178A (en) 2017-11-28
KR101830504B1 true KR101830504B1 (en) 2018-02-20

Family

ID=60326031

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160060908A KR101830504B1 (en) 2016-05-18 2016-05-18 In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment

Country Status (2)

Country Link
KR (1) KR101830504B1 (en)
WO (1) WO2017200129A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102036314B1 (en) * 2017-12-29 2019-10-25 (주)터보소프트 Spatial web object retrieval system based on distributed processing and spatial web object retrieval method in using same
KR102464740B1 (en) * 2019-02-28 2022-11-08 한국정보통신기술협회 Test automation framework for dbms for analysis of bigdata and method of test automation
KR20230125586A (en) 2022-02-21 2023-08-29 주식회사 케이티 Method for providing seamless service in a database and method for changing a service structure for the same

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037024B1 (en) * 2008-06-04 2011-10-11 Google Inc. Data propagation in a multi-shard database system
US20120136835A1 (en) * 2010-11-30 2012-05-31 Nokia Corporation Method and apparatus for rebalancing data
US9256659B1 (en) * 2012-08-08 2016-02-09 Amazon Technologies, Inc. Systems and methods for generating database identifiers based on database characteristics
KR101527634B1 (en) * 2012-08-10 2015-06-10 네이버 주식회사 Method and apparatus for providing sharding service
KR20150112357A (en) * 2014-03-27 2015-10-07 (주)시엠아이코리아 Sensor data processing system and method thereof

Also Published As

Publication number Publication date
KR20170130178A (en) 2017-11-28
WO2017200129A1 (en) 2017-11-23

Similar Documents

Publication Publication Date Title
US11882054B2 (en) Terminating data server nodes
US11645183B1 (en) User interface for correlation of virtual machine information and storage information
CN109063196B (en) Data processing method and device, electronic equipment and computer readable storage medium
US10255234B2 (en) Method for storing data elements in a database
CN109947668A (en) The method and apparatus of storing data
US11093496B1 (en) Performance-based query plan caching
CN111258978A (en) Data storage method
CN103207919A (en) Method and device for quickly inquiring and calculating MangoDB cluster
CN103116627A (en) Database access method with high concurrency service-oriented architecture (SOA) technology and system
KR101830504B1 (en) In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment
US10671636B2 (en) In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment
US11714573B1 (en) Storage optimization in a distributed object store
US20210097049A1 (en) Method, device and computer program product for managing index tables
CN103761255A (en) Method and system for optimizing data storage of NoSQL mode
Zhang et al. Design and implementation of a real-time interactive analytics system for large spatio-temporal data
CN108509438B (en) ElasticSearch fragment expansion method
CN109739883A (en) Promote the method, apparatus and electronic equipment of data query performance
CN113934361B (en) Method, apparatus and computer program product for managing a storage system
US10712959B2 (en) Method, device and computer program product for storing data
CN111563123A (en) Live warehouse metadata real-time synchronization method
WO2013053254A1 (en) Index creation method and system
Lai et al. Evaluating data storage structures of MapReduce
CN111078635B (en) Data processing method based on Hadoop
Aydin et al. Distributed log analysis on the cloud using mapreduce
KR102186918B1 (en) Real-Time High-Speed Stream Data Processing Method based on Distributed Computing Environment

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
AMND Amendment
E601 Decision to refuse application
AMND Amendment
X701 Decision to grant (after re-examination)
GRNT Written decision to grant