KR102084219B1

KR102084219B1 - Connected data architecture of datalake framework

Info

Publication number: KR102084219B1
Application number: KR1020180065490A
Authority: KR
Inventors: 차병래; 박선
Original assignee: 제노테크주식회사
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2020-05-22
Also published as: KR20190143519A

Abstract

본 발명은 물리적으로 분할된 다수의 스토리지를 소프트웨어에 의해 논리적으로 묶어 데이터를 체계적이고 구조적으로 관리하면서 안정적으로 인터페이스별 데이터를 전송할 수 있는 데이터레이크 프레임워크의 연결 데이터 아키텍쳐에 관한 것이다.
본 발명은 데이터가 저장되기 위한 데이터레이크(100)와, 상기 데이터레이크(100)로부터 데이터를 수신받아 백업하기 위한 클라우드스토리지(200)와, 상기 데이터레이크(100)로부터 데이터를 수신받거나 전송하기 위한 데이터센터(300)와, 서로 식별가능하도록 미리 식별정보가 저장되면서 상기 데이터레이크(100)의 데이터를 전송받기 위해 상기 데이터레이크(100)로 상기 식별정보를 전송하기 위한 다수의 마이크로스토리지(400)를 포함하고, 상기 데이터레이크(100)는, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400) 중 어느 하나 이상과 통신하기 위한 데이터레이크통신부(101)와, 미리 입력된 데이터를 저장하고 관리하기 위한 데이터저장부(102)와, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400)에 각각 대응되도록 데이터를 전송하는 전송함수를 생성하기 위한 전송함수생성부(103)와, 상기 전송함수생성부(103)에서 생성된 전송함수를 오버라이드한 메소드를 이용하여 상기 데이터저장부(102)에 저장된 데이터를 상기 클라우드스토리지(200) 또는 상기 데이터센터(300)로 전송하기 위한 제1데이터전송부(104)와, 각 마이크로스토리지로 서로 다른 데이터를 전송하기 위해 상기 전송함수생성부(103)에서 생성된 전송함수를 오버로드한 메소드를 이용하여 상기 데이터저장부(102)에서 각 마이크로스토리지(400)의 식별정보에 대응되는 데이터를 추출하여 해당 식별정보를 갖는 마이크로스토리지로 전송하기 위한 제2데이터전송부(105)를 포함하여 구성된다.The present invention relates to a connection data architecture of a data lake framework that can logically bundle a plurality of physically divided storages by software to manage data systematically and structurally and to transmit data by interface stably.
The present invention provides a data lake 100 for storing data, a cloud storage 200 for receiving and backing up data from the data lake 100, and a data lake 100 for receiving or transmitting data from the data lake 100. Data center 300, a plurality of micro storage 400 for transmitting the identification information to the data lake 100 to receive the data of the data lake 100 while the identification information is stored in advance to be able to identify each other Including, the data lake 100, the cloud storage 200 or the data center 300 or the data lake communication unit 101 for communicating with any one or more of the micro storage 400, and input in advance Data storage unit 102 for storing and managing the data, and for generating a transfer function for transmitting data to correspond to the cloud storage 200 or the data center 300 or the micro storage 400, respectively The data stored in the data storage unit 102 is transferred to the cloud storage 200 or the data center using a method that overrides the transmission function generation unit 103 and the transmission function generated by the transmission function generation unit 103. The first data transmission unit 104 for transmitting to 300 and the method of overloading the transmission function generated by the transmission function generation unit 103 to transmit different data to each micro-storage using the above The data storage unit 102 is configured to include a second data transmission unit 105 for extracting data corresponding to the identification information of each micro-storage 400 and transmitting it to the micro-storage having the corresponding identification information.

Description

CONNECTED DATA ARCHITECTURE OF DATALAKE FRAMEWORK

본 발명은 데이터레이크 프레임워크의 연결 데이터 아키텍쳐에 관한 것으로, 보다 상세하게는 물리적으로 분할된 다수의 스토리지를 소프트웨어에 의해 논리적으로 묶어 데이터를 체계적이고 구조적으로 관리하면서 안정적으로 인터페이스별 데이터를 전송할 수 있는 데이터레이크 프레임워크의 연결 데이터 아키텍쳐에 관한 것이다.The present invention relates to a connection data architecture of a data lake framework, and more specifically, a plurality of physically partitioned storages can be logically bundled by software to systematically and systematically manage data while stably transmitting interface-specific data. It's about the connected data architecture of the Data Lake framework.

최근 IT기술의 발달로 인해 기업 내에 인터넷 등의 사용이 증가하면서 많은 양의 데이터를 생산하고 소비한다. 또한, 점차적으로 기업의 가치는 데이터 중심으로 이동하고 있다.Due to the recent development of IT technology, the use of the Internet and the like in the enterprise increases, producing and consuming a large amount of data. In addition, the value of the enterprise is gradually shifting toward data.

이에 기업에서는 많은 양의 데이터를 저장하고 관리하기 위해 기업 데이터 구축 및 분석 시스템의 필요성을 인식하고 데이터 웨어하우스 또는 데이터 사일로 등을 구축하고 있는 추세이다.Accordingly, companies tend to recognize the need for an enterprise data construction and analysis system to store and manage large amounts of data, and build data warehouses or data silos.

데이터 웨어하우스는 방대한 조직 내에서 분산 운영되는 각각의 데이터 베이스 관리 시스템들을 효율적으로 통합하여 조정 및 관리하며, 효율적인 의사 결정 시스템을 위한 기초를 제공하는 실무적인 활용 방법론으로써, 관리 하드웨어, 관리 소프트웨어, 추출ㆍ변환ㆍ정렬 도구, 데이터 베이스 마케팅 시스템, 메타 데이터(meta data), 최종 사용자 접근 및 활용 도구 등으로 구성된다.Data warehouse is a practical utilization methodology that efficiently integrates, coordinates, and manages each database management system that is distributed and operated within a large organization, and provides a basis for an efficient decision-making system. Management hardware, management software, and extraction ㆍ Consists of conversion and sorting tools, database marketing system, meta data, and end user access and utilization tools.

이러한 데이터 웨어하우스는 등록특허 제10-1543506호(등록일자: 2015년 06월 04일)에 기재된 바와 같이, 다수의 소스 데이터 중 일부를 정제하는 ODS(Operational Data Store)와, 상기 ODS의 데이터들을 통합하고, 연관관계를 가지는 데이터들 간에 참조 관계를 생성하는 DW(Data Warehouse)와, 기설정된 비즈니스 규칙을 기초로 상기 ODS 또는 상기 DW의 데이터들에 대해 분석 주제별로 다차원 모델을 생성하는 다수의 데이터 마트 및 상기 다수의 데이터 마트 중 하나의 데이터 마트에 대하여 주제별로 다수의 테이블을 생성하는 테이블 보고서를 포함하며, 상기 테이블 보고서는, 기준실적을 기준으로 하는 원실적과 조정실적을 이용하여 서로 다른 테이블인 평가실적 테이블과 대차대조표실적 테이블을 생성하고, 실적 변동의 근기자료를 나타내는 실적근기정보 테이블과 품질관리정보 테이블을 생성하며, 상기 조정실적은, 이수관, 고객재분류, 역마진, 타처, 추천계좌, 수기조정, 구속성 부실MOU 및 직원실적 조정 모두를 고려하여 형성될 수 있다.Such data warehouses include the ODS (Operational Data Store) that purifies some of the plurality of source data, and the data of the ODS, as described in Patent No. 10-1543506 (Registration Date: June 04, 2015). DW (Data Warehouse) that integrates and creates a reference relationship between related data, and a plurality of data that generates a multidimensional model for each analysis subject on the data of the ODS or the DW based on predetermined business rules It includes a table report for generating a plurality of tables for each subject of the data mart and one of the plurality of data marts, wherein the table report is different tables using the original performance and the adjustment performance based on the reference performance Create a performance evaluation table and a balance sheet performance table, and a performance performance information table and quality management information table that represent the historical data of performance fluctuations, and the above adjustment results are completed, customer reclassification, reverse margin, target, recommendation It can be formed taking account of account, handwriting adjustment, binding bad MOU, and staff performance adjustment.

그러나 데이터 웨어하우스는 데이터양의 방대함과 복잡성으로 인해 실패 위험이 있으며, 막대한 비용과 기간을 투자해야 하는 문제점이 있고 기업에서 처리되는 데이터들도 다양한 형태로 대량화되어 짐에 따라 이를 효율적으로 처리할 수 있는 데이터 레이크를 사용하는 기업들이 많아지고 있는 추세이다.However, data warehouses have a risk of failure due to the vast amount of data and the complexity of the data, and there is a problem of investing enormous cost and time, and the data processed by enterprises can be efficiently processed as it is mass-produced in various forms. Companies using data lakes are on the rise.

데이터 레이크는 일반적인 데이터베이스 구조를 먼저 정의한 다음, 이 구조에 맞는 데이터로 데이터를 채우는 대신에 모든 종류의 데이터를 저장한 다음 필요할 때 이 데이터를 필요한 형식으로 사용할 수 있게 한다.Instead of first defining a common database structure and then filling it with data that fits this structure, data lakes store all kinds of data and then use that data in the format you need when you need it.

데이터 레이크는 모든 유형의 데이터를 어떤 규모라도 저렴한 비용으로 수집 및 저장이 가능하게 되며, 데이터 보안 및 무단 액세스 방지, 중앙 저장소에서 관련 데이터를 카탈로그화, 검색 및 발견, 새로운 유형의 데이터 분석 수행 등을 할 수 있다.Data Lake enables all types of data to be collected and stored at any cost at a low cost, secure data and prevent unauthorized access, catalog related data in a central repository, search and discover, and perform new types of data analysis. can do.

또한, 기업에서 처리되는 데이터들도 다양한 형태로 대량화됨으로 인해 빅데이터를 활용하는 유스케이스의 사용이 증가하고 있다.In addition, the use of use cases that utilize big data is increasing because data processed in enterprises is also mass-produced in various forms.

유스케이스는 기업 등의 빅데이터를 활용하는 것으로, 시스템 사이에서 교환되는 메시지의 중요도에 따라 클래스나 시스템에 제공되는 고유 기능 단위이며, 상호 행위자 밖의 하나 혹은 그 이상의 것이 시스템에 의해서 실행되는 행위를 함께 한다.The use case is to utilize big data such as a company, and is a unique functional unit provided to a class or system according to the importance of messages exchanged between systems, and one or more other than the actors act together by the system. do.

이러한 데이터레이크에 저장된 대용량의 데이터를 안정적으로 전송할 수 있도록 인터페이스별 유스케이스에 대한 기술이 요구되고 있는 실정이다.In order to stably transmit large amounts of data stored in the data lake, there is a need for technology for use cases for each interface.

본 발명은 상술한 문제점을 해결하기 위해 제안된 것으로, 기업 등에서 대용량으로 데이터가 저장되고 물리적으로 분할된 다수의 스토리지를 소프트웨어에 의해 논리적으로 묶어 데이터를 체계적이고 구조적으로 관리하면서 안정적으로 인터페이스별 데이터를 전송할 수 있는 데이터레이크 프레임워크의 연결 데이터 아키텍쳐를 제공하는 목적이 있다.The present invention has been proposed to solve the above-mentioned problems, and data is stored in a large capacity in a company or the like and logically bundled multiple storages physically by software to manage data systematically and structurally while stably managing data for each interface. The aim is to provide a connection data architecture of a data lake framework that can be transmitted.

본 발명이 해결하려는 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기의 목적을 달성하기 위한 본 발명에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐는, 데이터가 저장되기 위한 데이터레이크(100)와, 상기 데이터레이크(100)로부터 데이터를 수신받아 백업하기 위한 클라우드스토리지(200)와, 상기 데이터레이크(100)로부터 데이터를 수신받거나 전송하기 위한 데이터센터(300)와, 서로 식별가능하도록 미리 식별정보가 저장되면서 상기 데이터레이크(100)의 데이터를 전송받기 위해 상기 데이터레이크(100)로 상기 식별정보를 전송하기 위한 다수의 마이크로스토리지(400)를 포함하고, 상기 데이터레이크(100)는, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400) 중 어느 하나 이상과 통신하기 위한 데이터레이크통신부(101)와, 미리 입력된 데이터를 저장하고 관리하기 위한 데이터저장부(102)와, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400)에 각각 대응되도록 데이터를 전송하는 전송함수를 생성하기 위한 전송함수생성부(103)와, 상기 전송함수생성부(103)에서 생성된 전송함수를 오버라이드한 메소드를 이용하여 상기 데이터저장부(102)에 저장된 데이터를 상기 클라우드스토리지(200) 또는 상기 데이터센터(300)로 전송하기 위한 제1데이터전송부(104)와, 각 마이크로스토리지로 서로 다른 데이터를 전송하기 위해 상기 전송함수생성부(103)에서 생성된 전송함수를 오버로드한 메소드를 이용하여 상기 데이터저장부(102)에서 각 마이크로스토리지(400)의 식별정보에 대응되는 데이터를 추출하여 해당 식별정보를 갖는 마이크로스토리지로 전송하기 위한 제2데이터전송부(105)를 포함하여 구성된다.The connection data architecture of the data lake framework according to the present invention for achieving the above object includes a data lake 100 for storing data and a cloud storage for receiving data from the data lake 100 and backing it up ( 200), the data center 300 for receiving or transmitting data from the data lake 100, and the data lake to receive data of the data lake 100 while the identification information is stored in advance to be identifiable to each other It includes a plurality of micro storage 400 for transmitting the identification information to (100), the data lake 100, the cloud storage 200 or the data center 300 or the micro storage 400 Data lake communication unit 101 for communicating with any one or more of, and data storage unit 102 for storing and managing pre-entered data, the cloud storage 200 or the data center 300 or the micro The transfer function generator 103 for generating a transfer function for transmitting data to correspond to the storage 400, respectively, and the data storage using a method that overrides the transfer function generated by the transfer function generator 103 The first data transmission unit 104 for transmitting data stored in the unit 102 to the cloud storage 200 or the data center 300, and the transmission function generation to transmit different data to each microstorage Data corresponding to the identification information of each micro storage 400 is extracted from the data storage unit 102 using a method that overloads the transmission function generated by the unit 103 and transmitted to the micro storage having the corresponding identification information. It comprises a second data transmission unit 105 for the purpose.

상술한 바와 같이 본 발명에 따르면, 기업 등에서 대용량으로 데이터가 저장되고 물리적으로 분할된 다수의 스토리지를 소프트웨어에 의해 논리적으로 묶어 데이터를 체계적이고 구조적으로 관리하면서 안정적으로 인터페이스별 데이터를 전송함으로써 데이터 공간의 확장 및 백업은 물론 원활한 업로드와 코딩의 용이성을 향상시킬 수 있는 효과가 있다.As described above, according to the present invention, data is stored in a large capacity in a company or the like, and a plurality of storages physically divided are logically bundled by software to systematically and structurally manage data while stably transmitting data for each interface, thereby reducing data space. It has the effect of improving the ease of uploading and coding as well as extension and backup.

또한, 인터페이스별로 오버라이드와 오버로드를 활용하여 효율적으로 데이터를 전송할 수 있는 효과가 있다.In addition, it is possible to efficiently transmit data by using overrides and overloads for each interface.

도 1은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐,
도 2는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크,
도 3은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크의 Abyss Storage Cluster,
도 4는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크의 클라우드 버스팅과 클라우드 스패닝,
도 5는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 클라우드스토리지,
도 6은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터센터,
도 7은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 마이크로스토리지.1 is a connection data architecture of the data lake framework according to an embodiment of the present invention,
Figure 2 is a data lake of the connection data architecture of the data lake framework according to an embodiment of the present invention,
3 is an Abyss Storage Cluster of a data lake of a connection data architecture of a data lake framework according to an embodiment of the present invention,
Figure 4 is a cloud bursting and cloud spanning of the data lake of the connection data architecture of the data lake framework according to an embodiment of the present invention,
Figure 5 is a cloud storage of the connection data architecture of the data lake framework according to an embodiment of the present invention,
Figure 6 is a data center of the data architecture of the data lake framework according to an embodiment of the present invention,
7 is a microstorage of a connection data architecture of the data lake framework according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐를 상세히 설명한다.Hereinafter, the connection data architecture of the data lake framework according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐이고, 도 2는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크이며, 도 3은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크의 Abyss Storage Cluster이고, 도 4는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터레이크의 클라우드 버스팅과 클라우드 스패닝이다.1 is a connection data architecture of the data lake framework according to an embodiment of the present invention, Figure 2 is a data lake of the connection data architecture of the data lake framework according to an embodiment of the present invention, Figure 3 is the present invention Abyss Storage Cluster of the data lake of the connection data architecture of the data lake framework according to an embodiment, Figure 4 is cloud bursting of the data lake of the connection data architecture of the data lake framework according to an embodiment of the present invention Cloud spanning.

또한, 도 5는 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 클라우드스토리지이고, 도 6은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 데이터센터이며, 도 7은 본 발명의 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐의 마이크로스토리지이다.In addition, FIG. 5 is a cloud storage of a connection data architecture of a data lake framework according to an embodiment of the present invention, and FIG. 6 is a data center of a connection data architecture of a data lake framework according to an embodiment of the present invention, 7 is a microstorage of a connection data architecture of the data lake framework according to an embodiment of the present invention.

상기 도면의 구성 요소들에 인용부호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 동일한 부호를 가지도록 하고 있으며, 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 공지 기능 및 구성에 대한 상세한 설명은 생략한다. 또한, '상부', '하부', '앞', '뒤', '선단', '전방', '후단' 등과 같은 방향성 용어는 개시된 도면(들)의 배향과 관련하여 사용된다. 본 발명의 실시 예의 구성요소는 다양한 배향으로 위치설정될 수 있기 때문에 방향성 용어는 예시를 목적으로 사용되는 것이지 이를 제한하는 것은 아니다.In adding the quotation marks to the components of the drawings, the same components have the same reference numerals as possible even though they are displayed on different drawings, and a known function determined to unnecessarily obscure the subject matter of the present invention And detailed description of the configuration is omitted. In addition, directional terms such as 'top', 'bottom', 'front', 'back', 'leading', 'front', 'end', etc. are used in connection with the orientation of the disclosed drawing (s). Since the components of the embodiments of the present invention can be positioned in various orientations, the directional terms are used for illustrative purposes and are not limiting.

본 발명의 바람직한 일실시 예에 의한 데이터레이크 프레임워크의 연결 데이터 아키텍쳐는, 상기 도 1에 도시된 바와 같이, 데이터가 저장되기 위한 데이터레이크(100)와, 상기 데이터레이크(100)로부터 데이터를 수신받아 백업하기 위한 클라우드스토리지(200)와, 상기 데이터레이크(100)로부터 데이터를 수신받거나 전송하기 위한 데이터센터(300)와, 상기 데이터센터(300)로부터 데이터를 수신받기 위해 미리 식별정보가 저장된 다수의 마이크로스토리지(400)를 포함하여 구성된다.The connection data architecture of the data lake framework according to an exemplary embodiment of the present invention, as shown in FIG. 1, receives data from the data lake 100 for storing data and the data lake 100 Cloud storage 200 for receiving and backing up, data center 300 for receiving or transmitting data from the data lake 100, and a plurality of identification information stored in advance to receive data from the data center 300 It comprises a micro storage 400 of.

상기 데이터레이크 프레임워크의 연결 데이터 아키텍쳐는 상기 데이터레이크(100)와 상기 클라우드스토리지(200) 간의 인터페이스 및 상기 데이터레이크(100)와 상기 데이터센터(300) 간의 인터페이스를 정의 및 설계하고, 상기 데이터레이크(100)와 마이크로스토리지(400) 간의 인터페이스를 정의 및 설계한다.The connection data architecture of the data lake framework defines and designs the interface between the data lake 100 and the cloud storage 200 and the interface between the data lake 100 and the data center 300, and the data lake. Define and design the interface between (100) and microstorage (400).

상기 데이터레이크(100)는, 상기 도 2에 도시된 바와 같이, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400)와 통신하기 위한 데이터레이크통신부(101)와, 미리 입력된 데이터를 저장하고 관리하기 위한 데이터저장부(102)와, 상기 클라우드스토리지(200)와 상기 데이터센터(300)와 상기 마이크로스토리지(400)로 데이터를 전송하기 위해 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400)에 대응되도록 전송함수를 생성하기 위한 전송함수생성부(103)와, 상기 전송함수생성부(103)에서 생성된 전송 함수를 오버라이드한 메소드를 이용하여 상기 데이터저장부(102)에 저장된 데이터를 상기 클라우드스토리지(200) 또는 상기 데이터센터(300)로 전송하기 위한 제1데이터전송부(104)와, 상기 전송함수생성부(103)에서 생성된 전송 함수를 오버로드한 메소드를 이용하여 상기 데이터저장부(102)에서 각 마이크로스토리지(400)의 식별정보에 대응되는 데이터를 추출하고 전송하기 위한 제2데이터전송부(105)를 포함하여 구성된다.The data lake 100, as shown in Figure 2, the cloud storage 200 or the data center 300 or the data lake communication unit 101 for communicating with the micro storage 400, in advance The data storage unit 102 for storing and managing the input data, and the cloud storage 200 or the cloud storage 200 to transmit data to the data center 300 and the micro storage 400 The transmission function generator 103 for generating a transmission function to correspond to the data center 300 or the microstorage 400 and the method of overriding the transmission function generated by the transmission function generator 103 are used. The first data transmission unit 104 for transmitting data stored in the data storage unit 102 to the cloud storage 200 or the data center 300 and the transmission function generation unit 103 are generated. It comprises a second data transmission unit 105 for extracting and transmitting data corresponding to the identification information of each micro-storage 400 in the data storage unit 102 by using a method overloaded with a transfer function. .

또한, 상기 데이터레이크(100)는 상기 도 3에 도시된 바와 같이 구성된 SMB를 위한 대용량 Abyss Storage Cluster을 기반으로 형성되며, 실제적으로 Abyss Storage Cluster의 H/W 프로토타입 개발과 제품의 양산이 가능하다. 또한 Abyss Storage의 성능 향상을 위하여 스토리지의 디스크 매체별 성능 테스트와 스토리지의 내부 네트워크의 가속화를 위한 본딩(Bonding)과 네트워크를 이용한 국내외 네트워크 트래픽 테스트를 완료한 상태이다.In addition, the data lake 100 is formed on the basis of a large-capacity Abyss Storage Cluster for SMB configured as shown in FIG. 3, and it is possible to develop and mass produce H / W prototypes of Abyss Storage Cluster. . In addition, in order to improve the performance of Abyss Storage, the performance test for each disk medium of the storage, and bonding and network traffic tests for domestic and foreign networks using the network have been completed.

또한, 상기 데이터레이크(100)는 대용량 데이터를 캡처링, 처리, 분석하여 사용자 또는 데이터를 소비하는 시스템에 제공할 수 있도록 전사적 데이터 레이크를 구축하기 위해 물리적 계층(Physical Layer)과, 분산된 스토리지 계층(Distributed Storage Layer)과, 보안 계층(Security Layer)과, 데이터 수집 계층(Data Acquisition Layer)과, 메세징 계층(Messaging Layer)과, 유입 계층(Ingestion Layer)과, 람다 아키텍쳐(Lambda Architecture)와, 서비스 계층(Serving Layer)을 포함하여 구성될 수 있다.In addition, the data lake 100 captures, processes, and analyzes large amounts of data, and provides a physical data lake and a distributed storage layer to build an enterprise data lake so that it can be provided to a system that consumes users or data. (Distributed Storage Layer), Security Layer, Data Acquisition Layer, Messaging Layer, Ingestion Layer, Lambda Architecture, Services It may be configured to include a layer (Serving Layer).

이때, 상기 데이터레이크(100)는 상기 도 4에 도시된 바와 같이 상기 람다 아키텍쳐의 메타데이터와 콘텐츠 계층의 관계에 의해 클라우드 버스팅과 클라우드 스패닝을 제공하게 된다.At this time, the data lake 100 provides cloud bursting and cloud spanning according to the relationship between the metadata of the lambda architecture and the content layer, as shown in FIG. 4.

상기 클라우드 버스팅은 하이브리드 클라우드(혼합형 클라우드) 환경에서 사용되는 응용 프로그램 배포 모델이며, 상기 업무자기기(100)의 컴퓨팅 용량을 초과하면 초과 수요로 인해 퍼블릭 클라우드로 자동 전송되어 응용 프로그램이 계속 실행될 수 있도록 한다.The cloud bursting is an application distribution model used in a hybrid cloud (mixed cloud) environment, and when the computing capacity of the operator device 100 is exceeded, it is automatically transferred to the public cloud due to excessive demand, so that the application can be continuously executed. To make.

상기 클라우드 스패닝은 많은 컴퓨팅 자원들을 필요로 하는 응용 프로그램 구성 요소가 여러 클라우드 환경에서 동시에 배포되도록 하는 전달 모델이며, 여러 대의 컴퓨터를 연결하여 상호 협력하도록 할 수 있다.The cloud spanning is a delivery model that allows application components that require a lot of computing resources to be simultaneously deployed in multiple cloud environments, and can connect multiple computers to cooperate with each other.

상기 데이터레이크통신부(101)는, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400) 중 어느 하나 이상과 통신하기 위해 무선네트워크를 사용할 수 있다. 상기 무선네트워크는 와이파이, LTE(Long Term Evolution) 등을 사용할 수 있다.The data lake communication unit 101 may use a wireless network to communicate with one or more of the cloud storage 200 or the data center 300 or the micro storage 400. The wireless network may use Wi-Fi, Long Term Evolution (LTE), or the like.

상기 데이터저장부(102)는, 데이터를 저장하고 관리하기 위한 것으로, 사용자가 미리 데이터를 입력하여 저장할 수 있다. The data storage unit 102 is for storing and managing data, and a user may input and store data in advance.

또한, 상기 데이터저장부(102)는, 상기 클라우드스토리지(200)와 상기 데이터센터(300)와 상기 마이크로스토리지(400) 중 어느 하나에서 전송된 데이터를 입력하여 저장할 수도 있다.Also, the data storage unit 102 may input and store data transmitted from any one of the cloud storage 200, the data center 300, and the micro storage 400.

상기 전송함수생성부(103)는, 데이터를 전송할 수 있도록 상기 클라우드스토리지(200) 또는 상기 데이터센터(300) 또는 상기 마이크로스토리지(400)에 대응되는 전송함수를 생성한다. 즉, 상기 전송함수생성부(103)는 데이터를 수신받거나 전송하기 위해 연결되는 클라우드스토리지(200) 또는 데이터센터(300) 또는 마이크로스토리지(400)에 따라 데이터를 전송하기 위한 전송함수를 생성한다.The transmission function generator 103 generates a transmission function corresponding to the cloud storage 200 or the data center 300 or the micro storage 400 so that data can be transmitted. That is, the transmission function generation unit 103 generates a transmission function for transmitting data according to the cloud storage 200 or the data center 300 or the micro storage 400 connected to receive or transmit data.

상기 제1데이터전송부(104)는, 상기 클라우드스토리지(200) 또는 상기 데이터센터(300)로 데이터를 전송하기 위한 것으로, 상기 전송함수생성부(103)에서 상기 클라우드스토리지(200) 또는 상기 데이터센터(300)로 데이터를 전송하기 위해 생성된 전송함수를 오버라이드한 메소드를 이용한다.The first data transmission unit 104 is for transmitting data to the cloud storage 200 or the data center 300, and the cloud storage 200 or the data is transmitted from the transmission function generation unit 103. A method that overrides the generated transfer function to transmit data to the center 300 is used.

여기서, 오버라이드는 부모 클래스에 있는 메소드를 자식 클래스에서 재정의하는 것이다.Here, the override is to override the method in the parent class in the child class.

상기 제1데이터전송부(104)를 통해 상기 클라우드스토리지(200)와 연결되어 데이터를 전송할 경우에는 데이터 공간의 확장과 안전한 데이터전송 및 데이터 보안이 필요하다. 특히, 클라우드스토리지(200)의 자원을 이용한 지능분석 또는 예측 등 다양한 서비스를 지원할 수 있다.When the data is connected to the cloud storage 200 through the first data transmission unit 104, data expansion, secure data transmission, and data security are required. In particular, various services such as intelligent analysis or prediction using resources of the cloud storage 200 may be supported.

상기 제1데이터전송부(104)를 통해 상기 데이터센터(300)와 연결되어 데이터를 전송할 경우에는 데이터 공간의 확장 및 백업, 안전한 데이터 전송, 보안이 필수적으로 필요하게 된다.When the data is connected to the data center 300 through the first data transmission unit 104, data expansion and backup, secure data transmission, and security are necessary.

상기 제2데이터전송부(105)는, 상기 마이크로스토리지(400)와 연결되어 데이터를 전송한다.The second data transmission unit 105 is connected to the micro storage 400 to transmit data.

이때, 상기 제2데이터전송부(105)는 상기 데이터저장부(102)에서 각 마이크로스토리지(400)의 식별정보에 따라 데이터를 추출하고, 상기 전송함수생성부(103)에서 생성된 전송 함수를 오버로드한 메소드를 이용하여 해당 마이크로스토리지(400)로 상기 추출된 데이터를 전송한다.At this time, the second data transmission unit 105 extracts data according to the identification information of each micro-storage 400 from the data storage unit 102, and transfers the transmission function generated by the transmission function generation unit 103. The extracted data is transmitted to the corresponding micro storage 400 using the overloaded method.

여기서, 오버로드는 같은 이름의 메소드를 다른 파라미터를 사용하여 정의할 수 있는 것으로, 파라미터의 타입과 갯수가 변경될 수 있다.Here, the overload can define a method of the same name using different parameters, and the type and number of parameters can be changed.

상기 제2데이터전송부(105)를 통해 각 마이크로스토리지마다 다른 데이터를 전송할 수 있게 된다.Through the second data transmission unit 105, different data can be transmitted for each microstorage.

상기 클라우드스토리지(200)는, 상기 도 5에 도시된 바와 같이, 상기 데이터레이크(100)와 통신하기 위한 클라우드통신부(201)와, 상기 클라우드통신부(201)를 통해 상기 데이터레이크(100)로부터 수신받은 데이터를 백업하기 위한 백업부(202)를 포함하여 구성된다.The cloud storage 200 is received from the data lake 100 through the cloud communication unit 201 and the cloud communication unit 201 for communicating with the data lake 100, as shown in FIG. It comprises a backup unit 202 for backing up the received data.

상기 클라우드스토리지(200)는 디지털 데이터를 논리 풀에 저장하고 물리 스토리지가 복수의 서버들에 걸쳐있으면서 물리적인 환경이 일반적으로 호스팅 업체에 의해 소유, 관리되는 데이터 스토리지 모델이다. 이러한 클라우드 제공자들은 데이터를 늘 사용 및 접근할 수 있도록, 또 물리 환경이 보호된 상태로 실행되도록 보장하는 역할을 맡는다. 개인 또는 단체는 스토리지 용적을 제공자로부터 구매 또는 임대하여 사용자, 단체, 애플리케이션 데이터를 저장한다. 이에, 상기 클라우드스토리지(200)는 주로 대용량 데이터를 보관하며, 인터넷 연결을 통해 언제 어디서나 접속할 수 있게 된다.The cloud storage 200 is a data storage model in which a physical environment is generally owned and managed by a hosting company while storing digital data in a logical pool and physical storage spans a plurality of servers. These cloud providers are responsible for ensuring that data is always available and accessible and that the physical environment is running protected. Individuals or organizations purchase or rent storage volumes from providers to store user, organization, and application data. Accordingly, the cloud storage 200 mainly stores large amounts of data, and can be accessed anytime, anywhere through an Internet connection.

상기 클라우드통신부(201)는, 상기 데이터레이크(100)와 통신하기 위해 와이파이, 인터넷 등과 같은 무선네트워크를 사용할 수 있다.The cloud communication unit 201 may use a wireless network such as Wi-Fi or the Internet to communicate with the data lake 100.

상기 백업부(202)는, 상기 데이터레이크(100)로부터 수신받은 데이터를 백업한다. 즉, 상기 백업부(202)는 상기 클라우드통신부(201)를 통해 상기 데이터레이크(100)로부터 수신받은 데이터를 복사하여 백업한 후 저장하고 관리한다.The backup unit 202 backs up data received from the data lake 100. That is, the backup unit 202 copies the data received from the data lake 100 through the cloud communication unit 201, backs it up, and stores and manages it.

상기 데이터센터(300)는, 상기 도 6에 도시된 바와 같이, 상기 데이터레이크(100)와 통신하기 위한 센터통신부(301)와, 상기 센터통신부(301)를 통해 상기 데이터레이크(100)로부터 수신받은 데이터를 저장하고 관리하기 위한 데이터관리부(302)와, 상기 데이터레이크(100)로 미리 입력된 신규데이터를 전달하기 위한 데이터전달부(303)를 포함하여 구성된다.The data center 300, as shown in FIG. 6, receives from the data lake 100 through the center communication unit 301 and the center communication unit 301 for communicating with the data lake 100. It comprises a data management unit 302 for storing and managing the received data, and a data transmission unit 303 for delivering new data previously input to the data lake 100.

상기 데이터센터(300)는 주로 기업 등에서 사용되는 다수의 데이터인 빅데이터를 저장하고 관리한다.The data center 300 stores and manages big data, which is a large number of data mainly used in enterprises.

상기 센터통신부(301)는, 와이파이, 인터넷 등과 같은 무선네트워크를 통해 상기 데이터레이크(100)와 통신한다.The center communication unit 301 communicates with the data lake 100 through a wireless network such as Wi-Fi and the Internet.

상기 센터통신부(301)를 통해 상기 데이터레이크(100)로부터 데이터를 수신받을 경우에 상기 데이터관리부(302)로 전송된다.When receiving data from the data lake 100 through the center communication unit 301 is transmitted to the data management unit 302.

상기 데이터관리부(302)는, 상기 데이터레이크(100)로부터 수신받은 데이터를 저장하고 관리한다.The data management unit 302 stores and manages data received from the data lake 100.

이때, 상기 데이터관리부(302)에는 사용자가 신규로 데이터를 입력할 경우에 상기 사용자가 신규로 입력한 신규데이터를 더 저장하여 관리할 수 있다.In this case, when the user newly inputs data, the data management unit 302 may further store and manage the new data newly input by the user.

상기 데이터전달부(303)는, 미리 입력된 신규데이터를 상기 센터통신부(301)를 통해 상기 데이터레이크(100)로 전달한다.The data transmission unit 303 transmits the new data previously input to the data lake 100 through the center communication unit 301.

상기 마이크로스토리지(400)는, 상기 도 7에 도시된 바와 같이, 상기 데이터레이크(100)와 통신하기 위한 마이크로통신부(401)와, 다른 마이크로스토리지(400)와 식별가능하도록 식별정보를 생성하기 위한 식별정보부(402)와, 상기 데이터레이크(100)로부터 데이터를 전송받기 위해 식별정보를 전송하기 위한 식별정보전송부(403)와, 상기 식별정보전송부(403)의 식별정보를 수신받은 상기 데이터레이크(100)로부터 상기 식별정보에 대응되는 데이터를 수신받아 누적하여 저장하기 위한 데이터누적부(404)를 포함하여 구성된다.The micro-storage 400, as shown in Figure 7, the micro-communication unit 401 for communicating with the data lake 100, and for generating identification information to be distinguishable from other micro-storage 400 An identification information unit 402, an identification information transmission unit 403 for transmitting identification information to receive data from the data lake 100, and the data received the identification information of the identification information transmission unit 403 It comprises a data accumulator 404 for receiving and accumulating and storing data corresponding to the identification information from the rake 100.

상기 마이크로스토리지(400)는 스마트폰, 보안기기, 의료기기, 네비게이션, IoT 디바이스 등과 같은 전자기기가 될 수 있다.The micro storage 400 may be an electronic device such as a smart phone, a security device, a medical device, a navigation device, an IoT device, and the like.

상기 마이크로통신부(401)는, 와이파이 등과 같은 무선네트워크를 이용하여 상기 데이터레이크(100)와 통신한다.The micro communication unit 401 communicates with the data lake 100 using a wireless network such as Wi-Fi.

상기 식별정보부(402)는, 상기 마이크로스토리지(400)를 식별하기 위한 식별정보를 입력하여 생성한다. 이때, 상기 식별정보는 마이크로스토리지의 종류, 사용자정보 등이 포함되어 생성될 수 있다.The identification information unit 402 is generated by inputting identification information for identifying the micro storage 400. At this time, the identification information may be generated by including the type of micro storage, user information, and the like.

상기 생성된 식별정보는 상기 식별정보전송부(403)를 통해 상기 데이터레이크(100)로 전송된다. 즉, 상기 식별정보전송부(403)는, 상기 생성된 식별정보를 상기 데이터레이크(100)로 전송하여 데이터를 요청하게 된다.The generated identification information is transmitted to the data lake 100 through the identification information transmission unit 403. That is, the identification information transmission unit 403 transmits the generated identification information to the data lake 100 to request data.

상기 식별정보전송부(403)로 인해 상기 데이터레이크(100)로 식별정보를 전송함으로써 상기 마이크로스토리지(400)에 필요한 데이터를 전송받을 수 있게 된다.By transmitting the identification information to the data lake 100 due to the identification information transmission unit 403, data required for the micro storage 400 can be transmitted.

상기 데이터누적부(404)는, 상기 데이터레이크(100)로부터 수신받은 데이터를 누적하여 저장한다. 예를 들어 상기 마이크로스토리지(400)가 네비게이션일 경우에는 상기 식별정보부(402)를 통해 네비게이션이라는 정보를 포함하여 생성된 식별정보를 상기 식별정보전송부(403)로 인해 상기 데이터레이크(100)로 전송되어 상기 데이터레이크(100)로부터 상기 식별정보의 네비게이션에 대응되는 지도 등과 같은 데이터를 수신받아 누적하여 저장한다.The data accumulator 404 accumulates and stores data received from the data lake 100. For example, when the micro storage 400 is navigation, the identification information generated by the identification information unit 402, including navigation information, is transmitted to the data lake 100 by the identification information transmission unit 403. It is transmitted and receives data such as a map corresponding to the navigation of the identification information from the data lake 100, and stores and stores the accumulated data.

상기와 같이 구성된 데이터레이크 프레임워크의 연결 데이터 아키텍쳐는 물리적으로 분할된 다양한 스토리지들을 소프트웨어에 의한 논리적으로 묶을 수 있으며, 데이터레이크와 클라우드스토리지와 데이터센터와 마이크로스토리지 등과의 인터페이스별로 데이터를 전송하거나 수신받을 수 있게 된다. The connected data architecture of the data lake framework configured as described above can logically bundle various physically partitioned storages by software, and transmit or receive data for each interface between the data lake, cloud storage, data center, and micro storage. It becomes possible.

다시 말해, 상기 데이터레이크 프레임워크의 연결 데이터 아키텍쳐는 기업 등에서 대용량으로 데이터가 저장되고 물리적으로 분할된 다수의 스토리지를 소프트웨어에 의해 논리적으로 묶어 데이터를 체계적이고 구조적으로 관리하면서 안정적으로 인터페이스별 데이터를 전송함으로써 데이터 공간의 확장 및 백업은 물론 원활한 업로드와 코딩의 용이성을 향상시킬 수 있는 효과가 있다.In other words, the connected data architecture of the data lake framework is a large amount of data stored in a company or the like, and logically bundles a plurality of physically partitioned storages by software to manage data systematically and structurally while stably transmitting data by interface. This has the effect of improving the ease of uploading and coding as well as expansion and backup of the data space.

앞에서 설명되고, 도면에 도시된 본 발명의 실시 예들은 본 발명의 기술적 사상을 한정하는 것으로 해석되어서는 안 된다. 본 발명의 보호범위는 청구범위에 기재된 사항에 의하여만 제한되고, 본 발명의 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상을 다양한 형태로 개량 변경하는 것이 가능하다. 따라서 이러한 개량 및 변경은 통상의 지식을 가진 자에게 자명한 것인 경우에는 본 발명의 보호범위에 속하게 될 것이다.The embodiments described above and illustrated in the drawings should not be interpreted as limiting the technical spirit of the present invention. The scope of protection of the present invention is limited only by the items described in the claims, and a person having ordinary knowledge in the technical field of the present invention can improve and modify the technical spirit of the present invention in various forms. Therefore, such improvements and modifications will fall within the protection scope of the present invention if it is apparent to those skilled in the art.

100: 데이터레이크 101: 데이터레이크통신부
102: 데이터저장부 103: 전송함수생성부
104: 제1데이터전송부 105: 제2데이터전송부
200: 클라우드스토리지 201: 클라우드통신부
202: 백업부 300: 데이터센터
301: 센터통신부 302: 데이터관리부
303: 데이터전달부 400: 마이크로스토리지
401: 마이크로통신부 402: 식별정보부
403: 식별정보전송부 404: 데이터누적부100: Data Lake 101: Data Lake Communication Department
102: data storage unit 103: transmission function generation unit
104: first data transmission unit 105: second data transmission unit
200: cloud storage 201: cloud communication department
202: backup unit 300: data center
301: center communication unit 302: data management unit
303: data transfer unit 400: micro storage
401: micro communication unit 402: identification information unit
403: identification information transmission unit 404: data accumulation unit

Claims

Data lake 100 for storing data, cloud storage 200 for receiving and backing up data from the data lake 100, and data center for receiving or transmitting data from the data lake 100 ( 300) and a plurality of microstorages 400 for transmitting the identification information to the data lake 100 to receive the data of the data lake 100 while the identification information is stored in advance so as to be identifiable from each other. ,
The data lake 100 stores a data lake communication unit 101 for communicating with any one or more of the cloud storage 200 or the data center 300 or the micro storage 400, and previously input data. Data storage unit 102 for managing and managing, and a transmission function generation unit for generating a transmission function for transmitting data to correspond to the cloud storage 200 or the data center 300 or the micro storage 400, respectively. The data stored in the data storage unit 102 is transferred to the cloud storage 200 or the data center 300 by using the method of overriding the transmission function generated in 103 and the transmission function generation unit 103. The first data transmission unit 104 for transmission, and the data storage unit using a method of overloading the transmission function generated by the transmission function generation unit 103 to transmit different data to each microstorage ( 102) Connected data architecture of the data lake framework including a second data transmission unit 105 for extracting data corresponding to the identification information of each micro storage 400 and transmitting it to the micro storage having the corresponding identification information.