KR101830504B1 - In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment - Google Patents
In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment Download PDFInfo
- Publication number
- KR101830504B1 KR101830504B1 KR1020160060908A KR20160060908A KR101830504B1 KR 101830504 B1 KR101830504 B1 KR 101830504B1 KR 1020160060908 A KR1020160060908 A KR 1020160060908A KR 20160060908 A KR20160060908 A KR 20160060908A KR 101830504 B1 KR101830504 B1 KR 101830504B1
- Authority
- KR
- South Korea
- Prior art keywords
- distributed
- data
- constituting
- scheduling
- shards
- Prior art date
Links
Images
Classifications
-
- G06F17/30545—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/40—Data acquisition and logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G06F17/30194—
-
- G06F17/30318—
-
- G06F17/30595—
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
Abstract
A method and system for supporting in-memory DB connection for real-time analysis of big data based on a distributed environment are provided. A data processing method according to an embodiment of the present invention analyzes data based on a distributed environment using a distributed system and dynamically changes the structure of a distributed DB constituting a distributed system based on the distributed environment. Accordingly, the number of shards can be dynamically managed, thereby ensuring synchronism adaptively in a distributed environment, securing real-time performance through TMO-based scheduling, and ultimately improving speed / efficiency of big data analysis .
Description
The present invention relates to a distributed computing technology, and more particularly, to a DBMS structure and a scheduling method for analyzing big data in real time based on a distributed environment.
There is a limit in that not only a remarkable performance degradation and ineffective management problem of a distributed system occur but also a fragmentary analysis by a predetermined structure is possible in searching and analyzing the big data stored in the DBMS.
Particularly, when the data is large, the occurrence rate of the response delay is high and the environment setting for application to the distributed environment becomes complicated.
Currently, stream data real-time processing engines such as Storm and Spark, which are provided in open source form, can not simultaneously support various types of data.
Furthermore, when adding new nodes to analyze big data based on a distributed environment using this open source, it is troublesome to newly set up a JVM (Java Virtual Machine) environment setting.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a big data processing method for dynamically managing the number of shards in order to secure concurrency and a BBMS to which the same is applied .
It is another object of the present invention to provide a big data processing method for performing time-triggered message-triggered object (TMO) -based scheduling as a method for ensuring real-time performance and a BBMS using the same.
According to an aspect of the present invention, there is provided a data processing method including: analyzing data based on a distributed environment using a distributed system; And dynamically changing the structure of the distributed DB constituting the distributed system based on the distributed environment.
In the changing step, at least one of the shards constituting the distributed DB may be removed or a new shard may be added.
The changing step may change the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.
The distributed DB can be linked to the in-memory DB.
The method may further include performing scheduling for analyzing data of distributed servers constituting the distributed system.
The scheduling may be performed by time-driven deadline scheduling and deadline scheduling through an event.
In addition, the data may be data in which at least one of the fixed data, the semi-fixed data, and the unstructured data is combined.
The scheduling step may perform the scheduling based on a time-triggered message-triggered object (TMO).
The performing of the scheduling may select a distribution server capable of processing scheduling in a deadline.
Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distributed server for analyzing data based on a distributed environment and dynamically changing the structure of the distributed DB based on the distributed environment.
According to another aspect of the present invention, there is provided a data processing method including: performing scheduling for data analysis of distributed servers constituting a distributed system based on time-triggered message-triggered object (TMO); And analyzing the data based on the distributed environment according to the scheduling result.
Meanwhile, according to another embodiment of the present invention, a DBMS includes a distributed DB; And a distribution server for performing scheduling for analyzing data stored in the distributed DB based on TMO and analyzing data based on a distributed environment based on a scheduling result.
As described above, according to the embodiments of the present invention, the number of shards can be dynamically managed to ensure synchronism adaptively in a distributed environment, and real-time performance can be ensured through TMO-based scheduling, Data analysis speed / efficiency can be improved.
1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention,
FIG. 2 is a diagram illustrating a DBMS according to another embodiment of the present invention, and FIG.
Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.
Hereinafter, the present invention will be described in detail with reference to the drawings.
1. Big Data Processing Method
1 is a diagram schematically illustrating a method of processing big data according to an embodiment of the present invention.
As shown in FIG. 1, a method of processing big data according to an embodiment of the present invention includes mapping source stream data to the distributed systems by scheduling, and filtering the distributed systems for data analysis.
As shown in FIG. 1, the big data processing method according to the embodiment of the present invention is applied to a distributed system environment having a structure using a plurality of shards in order to ensure concurrency.
The number of shards in a distributed system is dynamic. That is, it is free to add and remove the number of shards according to the needs of the distributed system environment.
Also, as shown in FIG. 1, the big data processing method according to an embodiment of the present invention uses time-triggered message-triggered object (TMO) -based scheduling for real-time processing.
As a result, it is possible to improve the throughput of the complex type and high-speed stream big data, maximize the query processing and analysis speed, and optimize the analysis result.
2. Dynamic Sharding Structure
In the distributed system for processing large data according to the embodiment of the present invention, a distributed DB is implemented with a plurality of shards in order to ensure concurrency. In Fig. 1, the number of shards is 10, which is merely an example. Of course, other numbers of shards can be implemented.
Furthermore, the number of shards may be implemented dynamically, or may be changed as needed. That is, the number of shards can be dynamically determined based on the number of distributed servers constituting the distributed system. Thereafter, when the number of distributed servers is changed, the number of shards can be added or reduced based on the number of distributed servers. For example, adding a shard if a distributed server (node) is added, and reducing the number of shards if the distributed server is removed.
In addition, the number of shards can be changed based on the major type of input big data. For example, if the input data is changed to unstructured data, the number of shards can be added.
It is also possible to dynamically change the number of shards based on the speed of the big data analysis in a distributed system. For example, if the analysis speed is slow, the number of shards can be increased, and if the analysis speed is fast, the number of shards can be reduced.
Through the dynamic sharding structure, it is possible to adaptively adapt to the change of the distributed environment, maximize the distributed processing performance, and optimally support the concurrency that can collect the processing results.
3. TMO-based scheduling
The distributed system for processing big data according to the embodiment of the present invention performs time-triggered message-triggered object (TMO) -based scheduling in order to secure real-time performance.
The source stream data stored in the distributed system includes the following: structured stream data (data that may be used for statistical analysis: regional demographics), semi-structured stream data (data not readily available for statistical analysis: newspaper articles) Stream data (raw data format: moving picture, picture, SNS text), and the like.
The distributed system for processing big data according to the embodiment of the present invention adopts TMO-based scheduling in order to process the various types of source stream data in a complex real-time manner.
Due to the TMO-based scheduling, when scheduling for analysis of big data is processed in a distributed system environment, it can be divided into two groups of periodically scheduled deadline scheduling and event-based deadline scheduling.
This allows you to select a distributed server that can handle the scheduling in the deadline, allowing all distributed servers to complete the analysis in the deadline without overlapping.
4. Distributed system
2 is a diagram illustrating a DBMS (Data Base Management System) according to another embodiment of the present invention. 2, a DBMS according to an exemplary embodiment of the present invention includes a network interface card (NIC) 110, a
The NIC 110 is a communication means for accessing an external system and an external network and performing data communication.
The
That is, the DBMS according to the embodiment of the present invention is a DBMS having a structure in which an in-memory DB and a distributed DB are linked.
The
Also, the
A big data processing method using the DBMS shown in FIG. 2 is shown in FIG. Fig. 3 is a diagram provided for explaining a method of processing big data according to another embodiment of the present invention.
As shown in FIG. 3, the
Thereafter, when a change occurs in the distributed environment (S240-Y), the
It goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present invention may be embodied in computer-readable code form recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.
110: Memory
120: NIC
130: CPU
140: HDD
Claims (12)
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
Wherein when the input data is changed from the fixed data to the unstructured data, a shard constituting the distributed DB is added.
In the changing step,
And changes the structure of the distributed DB when the number of distributed servers constituting the distributed system is changed.
In the distributed DB,
In-memory DB.
And performing scheduling for data analysis of distributed servers constituting the distributed system.
Wherein the performing of the scheduling comprises:
And performing deadline scheduling by time-driven deadline scheduling and deadline scheduling through an event.
The data includes:
Wherein at least one of the fixed data, the semi-fixed data, and the unstructured data is a composite data.
Wherein the performing of the scheduling comprises:
Wherein the scheduling is performed based on Time-triggered Message-triggered Object (TMO).
Wherein the performing of the scheduling comprises:
And selects a distributed server capable of processing scheduling in a deadline.
And a distributed server for analyzing data based on a distributed environment and dynamically changing a structure of the distributed DB based on the distributed environment,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.
Analyzing the data based on the distributed environment according to the scheduling result; And
And dynamically changing a structure of a distributed DB constituting a distributed system based on a distributed environment,
In the changing step,
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB if the input data is changed from the fixed data to the unstructured data.
Scheduling for analysis of data stored in the distributed DB based on a time-triggered message-triggered object (TMO), analyzing data based on a distributed environment according to a scheduling result, And a distributed server that dynamically changes the content of the content,
The distributed server includes:
Based on the data analysis rate, at least one of the shards constituting the distributed DB is removed or a new shard is added,
Based on the type of data to be input, removes at least one of the shards constituting the distributed DB, adds a new shard,
And adding the shard constituting the distributed DB when the input data is changed from the fixed data to the unstructured data.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160060908A KR101830504B1 (en) | 2016-05-18 | 2016-05-18 | In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment |
PCT/KR2016/005418 WO2017200129A1 (en) | 2016-05-18 | 2016-05-23 | Method and system for processing high-speed stream big data on basis of in-memory db |
US15/464,798 US10671636B2 (en) | 2016-05-18 | 2017-03-21 | In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160060908A KR101830504B1 (en) | 2016-05-18 | 2016-05-18 | In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170130178A KR20170130178A (en) | 2017-11-28 |
KR101830504B1 true KR101830504B1 (en) | 2018-02-20 |
Family
ID=60326031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160060908A KR101830504B1 (en) | 2016-05-18 | 2016-05-18 | In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101830504B1 (en) |
WO (1) | WO2017200129A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102036314B1 (en) * | 2017-12-29 | 2019-10-25 | (주)터보소프트 | Spatial web object retrieval system based on distributed processing and spatial web object retrieval method in using same |
KR102464740B1 (en) * | 2019-02-28 | 2022-11-08 | 한국정보통신기술협회 | Test automation framework for dbms for analysis of bigdata and method of test automation |
KR20230125586A (en) | 2022-02-21 | 2023-08-29 | 주식회사 케이티 | Method for providing seamless service in a database and method for changing a service structure for the same |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8037024B1 (en) * | 2008-06-04 | 2011-10-11 | Google Inc. | Data propagation in a multi-shard database system |
US20120136835A1 (en) * | 2010-11-30 | 2012-05-31 | Nokia Corporation | Method and apparatus for rebalancing data |
US9256659B1 (en) * | 2012-08-08 | 2016-02-09 | Amazon Technologies, Inc. | Systems and methods for generating database identifiers based on database characteristics |
KR101527634B1 (en) * | 2012-08-10 | 2015-06-10 | 네이버 주식회사 | Method and apparatus for providing sharding service |
KR20150112357A (en) * | 2014-03-27 | 2015-10-07 | (주)시엠아이코리아 | Sensor data processing system and method thereof |
-
2016
- 2016-05-18 KR KR1020160060908A patent/KR101830504B1/en active IP Right Grant
- 2016-05-23 WO PCT/KR2016/005418 patent/WO2017200129A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
KR20170130178A (en) | 2017-11-28 |
WO2017200129A1 (en) | 2017-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11882054B2 (en) | Terminating data server nodes | |
US11645183B1 (en) | User interface for correlation of virtual machine information and storage information | |
CN109063196B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
US10255234B2 (en) | Method for storing data elements in a database | |
CN109947668A (en) | The method and apparatus of storing data | |
US11093496B1 (en) | Performance-based query plan caching | |
CN111258978A (en) | Data storage method | |
CN103207919A (en) | Method and device for quickly inquiring and calculating MangoDB cluster | |
CN103116627A (en) | Database access method with high concurrency service-oriented architecture (SOA) technology and system | |
KR101830504B1 (en) | In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment | |
US10671636B2 (en) | In-memory DB connection support type scheduling method and system for real-time big data analysis in distributed computing environment | |
US11714573B1 (en) | Storage optimization in a distributed object store | |
US20210097049A1 (en) | Method, device and computer program product for managing index tables | |
CN103761255A (en) | Method and system for optimizing data storage of NoSQL mode | |
Zhang et al. | Design and implementation of a real-time interactive analytics system for large spatio-temporal data | |
CN108509438B (en) | ElasticSearch fragment expansion method | |
CN109739883A (en) | Promote the method, apparatus and electronic equipment of data query performance | |
CN113934361B (en) | Method, apparatus and computer program product for managing a storage system | |
US10712959B2 (en) | Method, device and computer program product for storing data | |
CN111563123A (en) | Live warehouse metadata real-time synchronization method | |
WO2013053254A1 (en) | Index creation method and system | |
Lai et al. | Evaluating data storage structures of MapReduce | |
CN111078635B (en) | Data processing method based on Hadoop | |
Aydin et al. | Distributed log analysis on the cloud using mapreduce | |
KR102186918B1 (en) | Real-Time High-Speed Stream Data Processing Method based on Distributed Computing Environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
AMND | Amendment | ||
E601 | Decision to refuse application | ||
AMND | Amendment | ||
X701 | Decision to grant (after re-examination) | ||
GRNT | Written decision to grant |