KR20150089693A

KR20150089693A - Apparatus and Method for Extending Data Store System Based on Big Data Platform

Info

Publication number: KR20150089693A
Application number: KR1020140010605A
Authority: KR
Inventors: 원희선; 이두호; 원종호
Original assignee: 한국전자통신연구원
Priority date: 2014-01-28
Filing date: 2014-01-28
Publication date: 2015-08-05

Abstract

The present invention relates to an apparatus and a method for extending a data storage system based on a big data platform. The apparatus for extending a data storage system based on a big data platform according to the present invention enables a user to add a new data system (a service function) by the ways of plug-in without correction of a source code and interruption of a service and provides the added function (a new service function) via a data template for a user in order to enable a user to model data into a form of a service for using without learning of the lower structure of the platform.

Description

TECHNICAL FIELD [0001] The present invention relates to an apparatus and method for expanding a data storage system based on a Big Data Platform,

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for expanding a data storage system based on a big data platform, and more particularly, to an apparatus and method for expanding a data storage system based on a big data platform in a plug-in manner.

In general, the technologies required for utilizing Big Data include Big Data Collection Technology, Big Data Storage Management Technology, Big Data Processing Technology, Big Data Analysis Technology, and Knowledge Visualization Technology.

First, if we look at the big data collection technology, the data is divided into internal data and external data according to the location of the data source. Internal data collection mainly accesses the internal file system, database management system, , External data collection is the gathering of data from outside the Internet connection.

Internal data collection method ETL (Extraction, Transformation, Loading) is a system that extracts necessary data from various source systems and performs transformation and storage, or performs analysis. It performs all the processes of transmission and loading .

External Data Collection Methods The crawling engine is a collection of documents by which the robot follows a web link entangled like a web and creates copies of every page of the visited site.

Next, if you look at the big data storage management technology, you need to be able to effectively store and manage data in order to extract knowledge and wisdom from the data.

Data storage management means storing data in a safe and permanent way so that data is not contaminated or lost for the purpose of using the data in the future, and there is a high scalability, high performance and big data storage management.

Big data storage management should be able to accommodate the large capacity, unstructured, and real-time characteristics of big data, scale-out technology to increase capacity and speed by utilizing multiple nodes, (DFS), NoSQL, and a non-disk-based database management system are some of the technologies that have been applied to solve the speed problem by storing the data in the DRAM and the flash memory. .

The Distributed File System is a file system that divides and stores data on numerous servers to store and manage huge amounts of data, including the Google File System, the Hadoop Distributed File System, and the Amazon S3 File System, NoSQL refers to all DBMSs or datastores that do not use relational data models, does not use SQL, is a data store featuring horizontal scalability based on scale-out technology, and high-performance non-disk based DBMSs are DRAM Flash memory is used as the main data store.

If you look at the big data processing technology, you need big data processing technology because you need a technique that integrates the huge amount of data, the speed of data generation, and the variety of data types.

The Big Data Batch Processing technology is a distributed / parallel technology method that divides big data into several servers, divides them into separate servers, collects the results, and organizes the results. It includes Google MapReduce, Hadoop MapReduce, Big data real-time processing technologies include stream processing technology, Infosphere Streams, Twitter Storm, Yahoo's S4, and big data processing programming support technologies such as Google Sawzall and Hadoop PIG.

Next, when we look at Big Data Analysis Technology, representative Big Data Analysis techniques and analysis techniques include Text Mining, Opinion Mining, Social Network Analysis, Classification, Clustering Clustering, Machine Learning, Regression and Sentiment Analysis. Open Source Big Data Analysis Tool R is an open software package for statistical processing, the most popular Big Data Analysis Tool to be.

Finally, when you look at knowledge visualization techniques, you visualize a large number of patterns that your data produces, so you can understand intuitively and clearly what is happening and how it can evolve. In other words, knowledge visualization technology is a technique to express the data analysis results in an intuitive way for anyone to understand it intuitively.

Big data is analyzed because the future can be predicted.

On the other hand, major technologies related to big data processing can be classified into infrastructure aspect and analysis method aspect. First, in the aspect of infrastructure, big data processing technology is focused on processing terabytes or petabytes of data in real time.

Companies such as Yahoo, Amazon, and Google, which have dominated the Internet traffic early on, have taken the lead in developing cloud infrastructure technology and making it open source so that big data can be analyzed.

Among them, an open source-based Hadoop project, which was derived from Google's Google File System (GFS) and MapReduce technology, was developed to develop a representative distributed file system and programming model for cloud computing.

The big data processing method is as follows. First, big data is collected, the collected big data is stored in the mass storage, the stored big data is analyzed in the distributed / parallel manner, and the analyzed result is displayed through various views To the user.

Flume, Chukwa, and Scribe are the data collection solutions that easily and flexibly handle a series of flows from data collection to storage. Flume is a representative data collection solution that is open source at Cloudera.

Flume was developed to efficiently collect, aggregate, and move bulk log data. It is an open source for log management with high reliability and availability. It has a Fail-Over function that can continue delivering events without losing data even in the event of a failure (Scalability), data flow control, node monitoring, setting change, and output control of large-scale system can be centrally and dynamically managed (Ease of management).

The following is a description of the data storage and layout analysis platform. Hadoop is represented by such a big data storage analysis technique, since it is almost impossible to store, analyze and utilize exponentially growing big data.

Hadoop is an open source distributed processing technology project and is currently the preferred platform for both formal and unstructured big data analysis. It includes distributed file systems (HDFS (HadoopDistributedFileSystem) and MapReduce) and is basically a cost effective x86 PC node To configure distributed file system storage (HDFS), and to provide a Java-based MapReduce framework for batch analysis of huge datasets stored in HDFS.

There are various open source distributed processing projects based on other Hadoop.

The Hadoop platform can be configured to a wide variety of analytical purposes. In addition to the basic elements of the Distributed File System (HDFS) and distributed data processing system MapReduce, the Hadoop platform includes distributed databases HBase, search engine Nutch, Pig, Hive, a data warehousing solution, and HCatalog, a table and storage management service.

On the other hand, there are R and Mahout, an open source based statistical engine, for analyzing big data.

R, you can freely use many packages such as basic statistical analysis package, data mining package, and distributed processing in Hadoop environment because you can use many third party packages made with plug-in.

For example, RHIPE (Randhadoop Integrated Processing Environment), created by Saptarshi Guha of Perdue University, is a program that links R to MapReduce in a Hadoop environment to analyze millions of data in a very short time.

In addition, Big DataVisualization technology is a technology that collects and analyzes various data logs generated from social networks and servers, and shapes the efficiency of IT resources by analyzing them. It is a technique for expressing data effectively and processing it.

Representative visualization technology R provides a language and development environment for statistical calculation and visualization. It can implement modeling, modern data mining, simulation, and numerical analysis from basic statistical techniques. It is easy to link with other programming languages and is being used more for new drug research and financial forecasting analysis.

On the other hand, existing Big Data Platform structures depend on specific data storage systems, making it difficult to add or expand newly developed open source or commercial systems.

In other words, the existing data storage system based on the Big Data Platform can be used to modify a close coupled execution environment or modify the source code of an existing data storage system when a new data system is added to the current platform structure There is a problem that service interruption is inevitable, system instability increases, and management and maintenance costs for the system are generated.

In addition, there is an inconvenience that a technical interface must be learned through a learning process in order to actually utilize a new data system added by a user.

SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a big data platform-based data storage system expansion device and method capable of adding a new data system (service function) in a plug- The purpose is to do.

It is another object of the present invention to provide a large data platform based data storage system extension that provides users with additional functions (new service functions) through data templates so that data can be modeled and used in a service form without learning about the infrastructure of the platform Apparatus and method.

According to one aspect of the present invention, there is provided a method for expanding a data storage system based on a Big Data Platform, the method comprising: providing a New DS (Data Store) bundle corresponding to a data storage system to be expanded with an Open Service Gateway Initiative ) Adding to the framework's data store plug-in; Defining a data modeling template corresponding to the added New DS (Data Store) bundle and supporting data modeling, and adding the defined data modeling template to the template store; Adding an interface corresponding to the added New DS (Data Store) bundle to a local service registry of a specific OSGi framework; And releasing the data service corresponding to the added New DS (Data Store) bundle to each server of the entire OSGi framework.

According to the present invention, a data storage system based on a big data platform can be easily extended in a plug-in manner, and a service can be provided through an extended data storage system.

In other words, the data modeling template is provided so that it is possible to utilize a newly supported (added) data storage system without a separate learning process.

In particular, it is possible to expand, maintain and repair the data storage system, thereby reducing the management cost.

The new data storage system can be plugged in independently and dynamically added to the existing data storage system, and the service can be disclosed. Therefore, it is possible to provide the service using the added data storage system without interruption of the service to the user, Can be improved, and the stability of the system can be maintained.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram for explaining a concept of a data storage system expansion device based on a Big Data Platform according to an embodiment of the present invention; FIG.
FIG. 2 is a diagram for explaining the Big Data Platform-based data storage system expansion device of the present invention in more detail; FIG.
3 is a flowchart illustrating a method of expanding a data storage system based on a Big Data Platform according to an embodiment of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.

The present invention relates to an apparatus and method for expanding a data storage system based on a big data platform in a plug-in manner, wherein an extended data storage system provides a data template corresponding to an extended data service to a user in a user- Service.

Hereinafter, a Big Data Platform based data storage system expansion apparatus according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. FIG. 1 is a view for explaining a concept of a data storage system expansion device based on a Big Data Platform according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a Big Data Platform- FIG.

1, the Big Data Platform-based data storage system expansion apparatus 100 of the present invention includes a service management unit 110, an execution management unit 120, and a data management unit 130. As shown in FIG.

The service management unit 110 acquires data (user's request information and the like) through the Big Data Portal according to the analysis service request of the user, stores the acquired data, analyzes the data acquired through the analysis engine or stored data Modeling, and so on, and provides the analytical modeling results to the user through a visualization tool.

In addition, the service management unit 110 transmits the analysis modeling result to the execution management unit 120. [

The execution management unit 120 accesses necessary data according to the analysis modeling result transmitted from the service management unit 110 through the data management unit 130 and executes the data.

Meanwhile, the Big Data Platform-based data storage system expansion apparatus 100 of the present invention performs expansion of a Big Data Platform-based data storage system through a plug-in based on an OSGi (Open Service Gateway Initiative) framework.

Accordingly, the data management unit 130 includes at least one OSGi free framework.

OSGi (Open Service Gateway Initiative) is a standard for technology to dynamically install and run new services on network devices. Control devices belonging to the network bundle remotely to the controlled devices on which the OSGi service platform is built. Install, and use the services provided by the bundle.

A bundle is a basic unit of distribution and management in a network and includes at least one OSGi service and is managed by an OSGi framework.

In other words, the OSGi framework provides a standardized execution environment for various applications (bundles).

Hereinafter, the operation of the Big Data Platform-based data storage system expansion apparatus 100 will be described in detail with reference to FIG.

2, the Big Data Platform based data storage system expansion apparatus 100 includes a New DS (Data Store) bundle 210 implemented corresponding to a data storage system to be extended (newly added) And can discard the New DS (Data Store) bundle 210 added to the data store plug-in of the OSGi framework corresponding to the existing data storage system.

For example, the Big Data Platform-based data storage system expansion apparatus 100 may implement a data storage system to be extended (newly added) as a New DS (Data Store) bundle 210 and add it to the data store plug-in of the OSGi framework, And manages the life cycle of the New DS (Data Store) bundle 210 by discarding a New DS (Data Store) bundle 210 added to the OSGi framework's data store plug-in corresponding to the existing data storage system.

The Big Data Platform-based data storage system expansion apparatus 100 adds a new DS (Data Store) bundle 210 to the data storage plug-in of the OSGi framework, A data modeling template 240 corresponding to the data store bundle 210 and supporting data modeling is defined and the defined data modeling template 240 is added to the template store of the service management unit 110.

In addition, the Big Data Platform based data storage system expansion device 100 adds a technical interface corresponding to the added New DS (Data Store) bundle 210 to the local service registry 220 of the OSGi framework, So that it can be utilized.

That is, the Big Data Platform-based data storage system expansion device 100 registers the technical interface corresponding to the added New DS (Data Store) bundle 210 in the local service registry 220 of the OSGi framework, So that it can be utilized by the program.

Accordingly, the execution management unit 120 can directly access the technical interface registered and disclosed in the local service registry 220 of the OSGi framework.

Also, the Big Data Platform-based data storage system expansion device 100 manages the OSGi master node to apply the OSGi framework distributed system to the Big Data Platform-based data storage system.

That is, since the conventional OSGi framework operates in a single server, in order to apply the conventional OSGi framework to a big data storage and execution structure of a big data platform-based data storage system performed in a distributed server, .

For example, the Big Data Platform-based data storage system expansion device 100 manages the OSGi master node and transmits the expanded data of the Big Data Platform-based data storage system to the entire OSGi framework-based distributed system through the OSGi master global service registry 230 Service, and synchronizes the life cycle of the New DS (Data Store) bundle 210 added to the data store plug-in of a specific OSGi framework with the life cycle of each OSGi framework.

As described above, according to the present invention, a data storage system based on a big data platform can be easily expanded in a plug-in manner, and a service can be provided through an extended data storage system. In other words, by providing a data modeling template, it is possible to utilize a newly supported (added) data storage system without a separate learning process. In particular, it is possible to expand, maintain and repair a data storage system, The new data storage system can be plugged into the existing data storage system independently and dynamically, and the service can be released. Therefore, the user can utilize the added data storage system without service interruption to provide the service The quality of service can be improved, and the stability of the system can be maintained.

The Big Data Platform based data storage system expansion apparatus according to an embodiment of the present invention has been described above with reference to FIG. 1 and FIG. 2. Hereinafter, referring to FIG. 3, a Big Data Platform based on an embodiment of the present invention Describes a data storage system expansion method. 3 is a flowchart illustrating a method of expanding a data storage system based on a Big Data Platform according to an embodiment of the present invention.

As shown in FIG. 3, the Big Data Platform-based data storage system expansion method of the present invention implements a bundle of a data storage system to be newly added to a big data platform-based data storage system (S300).

For example, a data storage system to be newly added is implemented as a New DS (Data Store) bundle 210, and an implemented New DS (Data Store) bundle 210 is added to a data store plug-in of the OSGi framework.

The data modeling template 240 is registered (S301).

For example, the data modeling template 240 supporting the data modeling is defined in correspondence with the added New DS (Data Store) bundle 210, and the defined data modeling template 240 is stored in the template store of the service management unit 110 Add.

The technical interface corresponding to the added New DS (Data Store) bundle 210 is registered in the local service registry 220 of the OSGi framework (S302).

For example, a technical interface corresponding to the added New DS (Data Store) bundle 210 is added to the local service registry 220 of the OSGi framework so as to be utilized in the data analysis program.

It is determined whether or not the local activation is performed (S303).

For example, the implemented New DS (Data Store) bundle 210 is added to the OSGi framework's data store plug-in, and the defined data modeling template 240 corresponding to the added New DS (Data Store) Is added to the template store to determine whether the technical interface corresponding to the added New DS (Data Store) bundle 210 has been registered in the local service registry 220 of the OSGi framework.

As a result of the determination, if the technical interface corresponding to the added New DS (Data Store) bundle 210 is registered in the local service registry 220 of the OSGi framework, it is determined that the local is activated and the OSGi master global service registry 230 (S304).

For example, the Big Data Platform based data storage system extension method manages the OSGi master node to apply the OSGi framework based distributed system to the Big Data Platform based data storage system.

That is, since the conventional OSGi framework operates on a single server, the conventional OSGi framework is extended to a distributed environment to be applied to a big data storage and execution structure of a big data platform-based data storage system performed in a distributed server

It is determined whether the global activation is performed through the master node (S305).

For example, as the technical interface corresponding to the added New DS (Data Store) bundle 210 is registered in the local service registry 220 of the OSGi framework, the registration process for service disclosure in the local (specific OSGi framework) The master node activates globally by notifying each OSGi framework (full distributed server) that service is available for the New DS bundle 210 to which a local (specific OSGi framework) has been added.

When the master node completes notifying each OSGi framework (full distributed server) of the availability of the New DS bundle 210 to which the local (specific OSGi framework) has been added, each OSGi framework extended to the distributed environment The extended data service of the data platform based data storage system is released through the OSGi master global service registry 230 (S306), and the life cycle of a specific OSGi framework and the life cycle of each OSGi framework are synchronized.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

110: service management unit 120: execution management unit
130: Data management unit 210: New DS bundle
220: Local Service Registry
230: OSGi Mastercluster Service Registry
240: Data Modeling Template

Claims

A method for expanding a data storage system based on a Big Data Platform,
Adding a New DS (Data Store) bundle corresponding to a data storage system to be extended to a data store plug-in of a specific OSGi (Open Service Gateway Initiative) framework;
Defining a data modeling template corresponding to the added New DS (Data Store) bundle and supporting data modeling, and adding the defined data modeling template to the template store;
Adding an interface corresponding to the added New DS (Data Store) bundle to a local service registry of a specific OSGi framework; And
A step of releasing the data service corresponding to the added New DS (Data Store) bundle to each server of the entire OSGi framework
Based on the data size of the data storage system.