KR20150089693A - Apparatus and Method for Extending Data Store System Based on Big Data Platform - Google Patents
Apparatus and Method for Extending Data Store System Based on Big Data Platform Download PDFInfo
- Publication number
- KR20150089693A KR20150089693A KR1020140010605A KR20140010605A KR20150089693A KR 20150089693 A KR20150089693 A KR 20150089693A KR 1020140010605 A KR1020140010605 A KR 1020140010605A KR 20140010605 A KR20140010605 A KR 20140010605A KR 20150089693 A KR20150089693 A KR 20150089693A
- Authority
- KR
- South Korea
- Prior art keywords
- data
- storage system
- data storage
- big data
- platform
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
Description
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for expanding a data storage system based on a big data platform, and more particularly, to an apparatus and method for expanding a data storage system based on a big data platform in a plug-in manner.
In general, the technologies required for utilizing Big Data include Big Data Collection Technology, Big Data Storage Management Technology, Big Data Processing Technology, Big Data Analysis Technology, and Knowledge Visualization Technology.
First, if we look at the big data collection technology, the data is divided into internal data and external data according to the location of the data source. Internal data collection mainly accesses the internal file system, database management system, , External data collection is the gathering of data from outside the Internet connection.
Internal data collection method ETL (Extraction, Transformation, Loading) is a system that extracts necessary data from various source systems and performs transformation and storage, or performs analysis. It performs all the processes of transmission and loading .
External Data Collection Methods The crawling engine is a collection of documents by which the robot follows a web link entangled like a web and creates copies of every page of the visited site.
Next, if you look at the big data storage management technology, you need to be able to effectively store and manage data in order to extract knowledge and wisdom from the data.
Data storage management means storing data in a safe and permanent way so that data is not contaminated or lost for the purpose of using the data in the future, and there is a high scalability, high performance and big data storage management.
Big data storage management should be able to accommodate the large capacity, unstructured, and real-time characteristics of big data, scale-out technology to increase capacity and speed by utilizing multiple nodes, (DFS), NoSQL, and a non-disk-based database management system are some of the technologies that have been applied to solve the speed problem by storing the data in the DRAM and the flash memory. .
The Distributed File System is a file system that divides and stores data on numerous servers to store and manage huge amounts of data, including the Google File System, the Hadoop Distributed File System, and the Amazon S3 File System, NoSQL refers to all DBMSs or datastores that do not use relational data models, does not use SQL, is a data store featuring horizontal scalability based on scale-out technology, and high-performance non-disk based DBMSs are DRAM Flash memory is used as the main data store.
If you look at the big data processing technology, you need big data processing technology because you need a technique that integrates the huge amount of data, the speed of data generation, and the variety of data types.
The Big Data Batch Processing technology is a distributed / parallel technology method that divides big data into several servers, divides them into separate servers, collects the results, and organizes the results. It includes Google MapReduce, Hadoop MapReduce, Big data real-time processing technologies include stream processing technology, Infosphere Streams, Twitter Storm, Yahoo's S4, and big data processing programming support technologies such as Google Sawzall and Hadoop PIG.
Next, when we look at Big Data Analysis Technology, representative Big Data Analysis techniques and analysis techniques include Text Mining, Opinion Mining, Social Network Analysis, Classification, Clustering Clustering, Machine Learning, Regression and Sentiment Analysis. Open Source Big Data Analysis Tool R is an open software package for statistical processing, the most popular Big Data Analysis Tool to be.
Finally, when you look at knowledge visualization techniques, you visualize a large number of patterns that your data produces, so you can understand intuitively and clearly what is happening and how it can evolve. In other words, knowledge visualization technology is a technique to express the data analysis results in an intuitive way for anyone to understand it intuitively.
Big data is analyzed because the future can be predicted.
On the other hand, major technologies related to big data processing can be classified into infrastructure aspect and analysis method aspect. First, in the aspect of infrastructure, big data processing technology is focused on processing terabytes or petabytes of data in real time.
Companies such as Yahoo, Amazon, and Google, which have dominated the Internet traffic early on, have taken the lead in developing cloud infrastructure technology and making it open source so that big data can be analyzed.
Among them, an open source-based Hadoop project, which was derived from Google's Google File System (GFS) and MapReduce technology, was developed to develop a representative distributed file system and programming model for cloud computing.
The big data processing method is as follows. First, big data is collected, the collected big data is stored in the mass storage, the stored big data is analyzed in the distributed / parallel manner, and the analyzed result is displayed through various views To the user.
Flume, Chukwa, and Scribe are the data collection solutions that easily and flexibly handle a series of flows from data collection to storage. Flume is a representative data collection solution that is open source at Cloudera.
Flume was developed to efficiently collect, aggregate, and move bulk log data. It is an open source for log management with high reliability and availability. It has a Fail-Over function that can continue delivering events without losing data even in the event of a failure (Scalability), data flow control, node monitoring, setting change, and output control of large-scale system can be centrally and dynamically managed (Ease of management).
The following is a description of the data storage and layout analysis platform. Hadoop is represented by such a big data storage analysis technique, since it is almost impossible to store, analyze and utilize exponentially growing big data.
Hadoop is an open source distributed processing technology project and is currently the preferred platform for both formal and unstructured big data analysis. It includes distributed file systems (HDFS (HadoopDistributedFileSystem) and MapReduce) and is basically a cost effective x86 PC node To configure distributed file system storage (HDFS), and to provide a Java-based MapReduce framework for batch analysis of huge datasets stored in HDFS.
There are various open source distributed processing projects based on other Hadoop.
The Hadoop platform can be configured to a wide variety of analytical purposes. In addition to the basic elements of the Distributed File System (HDFS) and distributed data processing system MapReduce, the Hadoop platform includes distributed databases HBase, search engine Nutch, Pig, Hive, a data warehousing solution, and HCatalog, a table and storage management service.
On the other hand, there are R and Mahout, an open source based statistical engine, for analyzing big data.
R, you can freely use many packages such as basic statistical analysis package, data mining package, and distributed processing in Hadoop environment because you can use many third party packages made with plug-in.
For example, RHIPE (Randhadoop Integrated Processing Environment), created by Saptarshi Guha of Perdue University, is a program that links R to MapReduce in a Hadoop environment to analyze millions of data in a very short time.
In addition, Big DataVisualization technology is a technology that collects and analyzes various data logs generated from social networks and servers, and shapes the efficiency of IT resources by analyzing them. It is a technique for expressing data effectively and processing it.
Representative visualization technology R provides a language and development environment for statistical calculation and visualization. It can implement modeling, modern data mining, simulation, and numerical analysis from basic statistical techniques. It is easy to link with other programming languages and is being used more for new drug research and financial forecasting analysis.
On the other hand, existing Big Data Platform structures depend on specific data storage systems, making it difficult to add or expand newly developed open source or commercial systems.
In other words, the existing data storage system based on the Big Data Platform can be used to modify a close coupled execution environment or modify the source code of an existing data storage system when a new data system is added to the current platform structure There is a problem that service interruption is inevitable, system instability increases, and management and maintenance costs for the system are generated.
In addition, there is an inconvenience that a technical interface must be learned through a learning process in order to actually utilize a new data system added by a user.
SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a big data platform-based data storage system expansion device and method capable of adding a new data system (service function) in a plug- The purpose is to do.
It is another object of the present invention to provide a large data platform based data storage system extension that provides users with additional functions (new service functions) through data templates so that data can be modeled and used in a service form without learning about the infrastructure of the platform Apparatus and method.
According to one aspect of the present invention, there is provided a method for expanding a data storage system based on a Big Data Platform, the method comprising: providing a New DS (Data Store) bundle corresponding to a data storage system to be expanded with an Open Service Gateway Initiative ) Adding to the framework's data store plug-in; Defining a data modeling template corresponding to the added New DS (Data Store) bundle and supporting data modeling, and adding the defined data modeling template to the template store; Adding an interface corresponding to the added New DS (Data Store) bundle to a local service registry of a specific OSGi framework; And releasing the data service corresponding to the added New DS (Data Store) bundle to each server of the entire OSGi framework.
According to the present invention, a data storage system based on a big data platform can be easily extended in a plug-in manner, and a service can be provided through an extended data storage system.
In other words, the data modeling template is provided so that it is possible to utilize a newly supported (added) data storage system without a separate learning process.
In particular, it is possible to expand, maintain and repair the data storage system, thereby reducing the management cost.
The new data storage system can be plugged in independently and dynamically added to the existing data storage system, and the service can be disclosed. Therefore, it is possible to provide the service using the added data storage system without interruption of the service to the user, Can be improved, and the stability of the system can be maintained.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram for explaining a concept of a data storage system expansion device based on a Big Data Platform according to an embodiment of the present invention; FIG.
FIG. 2 is a diagram for explaining the Big Data Platform-based data storage system expansion device of the present invention in more detail; FIG.
3 is a flowchart illustrating a method of expanding a data storage system based on a Big Data Platform according to an embodiment of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.
The present invention relates to an apparatus and method for expanding a data storage system based on a big data platform in a plug-in manner, wherein an extended data storage system provides a data template corresponding to an extended data service to a user in a user- Service.
Hereinafter, a Big Data Platform based data storage system expansion apparatus according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. FIG. 1 is a view for explaining a concept of a data storage system expansion device based on a Big Data Platform according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a Big Data Platform- FIG.
1, the Big Data Platform-based data storage
The
In addition, the
The
Meanwhile, the Big Data Platform-based data storage
Accordingly, the
OSGi (Open Service Gateway Initiative) is a standard for technology to dynamically install and run new services on network devices. Control devices belonging to the network bundle remotely to the controlled devices on which the OSGi service platform is built. Install, and use the services provided by the bundle.
A bundle is a basic unit of distribution and management in a network and includes at least one OSGi service and is managed by an OSGi framework.
In other words, the OSGi framework provides a standardized execution environment for various applications (bundles).
Hereinafter, the operation of the Big Data Platform-based data storage
2, the Big Data Platform based data storage
For example, the Big Data Platform-based data storage
The Big Data Platform-based data storage
In addition, the Big Data Platform based data storage
That is, the Big Data Platform-based data storage
Accordingly, the
Also, the Big Data Platform-based data storage
That is, since the conventional OSGi framework operates in a single server, in order to apply the conventional OSGi framework to a big data storage and execution structure of a big data platform-based data storage system performed in a distributed server, .
For example, the Big Data Platform-based data storage
As described above, according to the present invention, a data storage system based on a big data platform can be easily expanded in a plug-in manner, and a service can be provided through an extended data storage system. In other words, by providing a data modeling template, it is possible to utilize a newly supported (added) data storage system without a separate learning process. In particular, it is possible to expand, maintain and repair a data storage system, The new data storage system can be plugged into the existing data storage system independently and dynamically, and the service can be released. Therefore, the user can utilize the added data storage system without service interruption to provide the service The quality of service can be improved, and the stability of the system can be maintained.
The Big Data Platform based data storage system expansion apparatus according to an embodiment of the present invention has been described above with reference to FIG. 1 and FIG. 2. Hereinafter, referring to FIG. 3, a Big Data Platform based on an embodiment of the present invention Describes a data storage system expansion method. 3 is a flowchart illustrating a method of expanding a data storage system based on a Big Data Platform according to an embodiment of the present invention.
As shown in FIG. 3, the Big Data Platform-based data storage system expansion method of the present invention implements a bundle of a data storage system to be newly added to a big data platform-based data storage system (S300).
For example, a data storage system to be newly added is implemented as a New DS (Data Store)
The
For example, the
The technical interface corresponding to the added New DS (Data Store)
For example, a technical interface corresponding to the added New DS (Data Store)
It is determined whether or not the local activation is performed (S303).
For example, the implemented New DS (Data Store)
As a result of the determination, if the technical interface corresponding to the added New DS (Data Store)
For example, the Big Data Platform based data storage system extension method manages the OSGi master node to apply the OSGi framework based distributed system to the Big Data Platform based data storage system.
That is, since the conventional OSGi framework operates on a single server, the conventional OSGi framework is extended to a distributed environment to be applied to a big data storage and execution structure of a big data platform-based data storage system performed in a distributed server
It is determined whether the global activation is performed through the master node (S305).
For example, as the technical interface corresponding to the added New DS (Data Store)
When the master node completes notifying each OSGi framework (full distributed server) of the availability of the
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.
110: service management unit 120: execution management unit
130: Data management unit 210: New DS bundle
220: Local Service Registry
230: OSGi Mastercluster Service Registry
240: Data Modeling Template
Claims (1)
Adding a New DS (Data Store) bundle corresponding to a data storage system to be extended to a data store plug-in of a specific OSGi (Open Service Gateway Initiative) framework;
Defining a data modeling template corresponding to the added New DS (Data Store) bundle and supporting data modeling, and adding the defined data modeling template to the template store;
Adding an interface corresponding to the added New DS (Data Store) bundle to a local service registry of a specific OSGi framework; And
A step of releasing the data service corresponding to the added New DS (Data Store) bundle to each server of the entire OSGi framework
Based on the data size of the data storage system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020140010605A KR20150089693A (en) | 2014-01-28 | 2014-01-28 | Apparatus and Method for Extending Data Store System Based on Big Data Platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020140010605A KR20150089693A (en) | 2014-01-28 | 2014-01-28 | Apparatus and Method for Extending Data Store System Based on Big Data Platform |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20150089693A true KR20150089693A (en) | 2015-08-05 |
Family
ID=53886085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020140010605A KR20150089693A (en) | 2014-01-28 | 2014-01-28 | Apparatus and Method for Extending Data Store System Based on Big Data Platform |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20150089693A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180008247A (en) * | 2016-07-14 | 2018-01-24 | 김경호 | Platform for providing task based on deep learning |
KR20190076122A (en) | 2017-12-22 | 2019-07-02 | 가톨릭관동대학교산학협력단 | Big Data Exploratory Data Analysis-based Visualization System |
KR102146116B1 (en) * | 2020-05-28 | 2020-08-20 | 주식회사 갑인정보기술 | A method of unstructured big data governance using open source analysis tool based on machine learning |
-
2014
- 2014-01-28 KR KR1020140010605A patent/KR20150089693A/en not_active Application Discontinuation
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180008247A (en) * | 2016-07-14 | 2018-01-24 | 김경호 | Platform for providing task based on deep learning |
KR20190076122A (en) | 2017-12-22 | 2019-07-02 | 가톨릭관동대학교산학협력단 | Big Data Exploratory Data Analysis-based Visualization System |
KR102146116B1 (en) * | 2020-05-28 | 2020-08-20 | 주식회사 갑인정보기술 | A method of unstructured big data governance using open source analysis tool based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Inoubli et al. | An experimental survey on big data frameworks | |
Yamamoto et al. | Using cloud technologies for large-scale house data in smart city | |
Xhafa et al. | Processing and analytics of big data streams with yahoo! s4 | |
KR20150092586A (en) | Method and Apparatus for Processing Exploding Data Stream | |
CN102999633A (en) | Cloud cluster extraction method of network information | |
CN103338135A (en) | Real-time monitoring method of cluster storage capacity | |
CN104363222A (en) | Hadoop-based network security event analyzing method | |
Dagade et al. | Big data weather analytics using hadoop | |
Pääkkönen | Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing | |
Senger et al. | BSP cost and scalability analysis for MapReduce operations | |
Liu et al. | On construction of an energy monitoring service using big data technology for smart campus | |
CN112148578A (en) | IT fault defect prediction method based on machine learning | |
Izsó et al. | IncQuery-D: incremental graph search in the cloud. | |
Inoubli et al. | Big data frameworks: A comparative study | |
Li et al. | Deep-level quality management based on big data analytics with case study | |
KR20150089693A (en) | Apparatus and Method for Extending Data Store System Based on Big Data Platform | |
Wakde et al. | Comparative analysis of hadoop tools and spark technology | |
Krevat et al. | Applying performance models to understand data-intensive computing efficiency | |
Noh et al. | Bigdata platform design and implementation model | |
Sarnovský et al. | Analytical platform based on Jbowl library providing text-mining services in distributed environment | |
CN113177088A (en) | Multi-scale simulation big data management system for material irradiation damage | |
Kyoo-sung et al. | Bigdata platform design and implementation model | |
Marosi et al. | Toward reference architectures: A cloud-agnostic data analytics platform empowering autonomous systems | |
Chen et al. | Big data storage architecture design in cloud computing | |
Yang et al. | On construction of the air pollution monitoring service with a hybrid database converter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |