KR20200092507A

KR20200092507A - An installation system and installation method of a distributed processing system for mass data processing applicable to various clouds and its distributed processing system

Info

Publication number: KR20200092507A
Application number: KR1020190004793A
Authority: KR
Inventors: 감민재; 김비; 김동민; 이홍은; 이소영; 김유하; 서영준
Original assignee: (주) 익투스지노믹스
Priority date: 2019-01-14
Filing date: 2019-01-14
Publication date: 2020-08-04
Also published as: KR102257012B1

Abstract

The present invention relates to an installation system of a distributed processing system for processing large volume data that is applicable to various clouds, and more particularly, to an installation system of a distributed processing system for processing large volume data based on various cloud environments, an installation method thereof, and the distributed processing system. The distributed processing system that is not dependent on a specific cloud system and can be installed in various cloud environments that can be stored in various types of media by processing data at high speed at the time of processing data comprises: at least one master node installed from an installation package for the first time; at least one storage node for storing processed data; and at least one calculation node that is created through the master node when inputting data to be analyzed and processes all or a part of the data.

Description

An installation system and installation method of a distributed processing system for mass data processing applicable to various clouds and its distributed processing system}

본 발명은 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템 및 설치방법과 그 분산 처리 시스템으로서, 더욱 상세하게는, 다양한 클라우드 환경은 물리적인 시스템의 위치에 따라 퍼블릭 클라우드(Public Cloud), 프라이빗 클라우드(Private Cloud), 하이브리드 클라우드(Hybrid Cloud) 또는 한 종류 이상의 상업적 클라우드 서비스 업체가 제공하는 클라우드를 포함 할 수 있다. 본 발명은 다양한 형태로 저장매체에 저장될수 있는 설치패키지로부터 클라이언트에 설치시스템이 복사되어 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템을 설치하는 설치시스템 및 설치방법과 그 분산 처리 시스템에 관한 것이다.The present invention is an installation system and installation method of a distributed processing system for processing large-capacity data applicable to various clouds, and a distributed processing system thereof. More specifically, various cloud environments are public cloud according to the location of a physical system. , Private Cloud, Hybrid Cloud, or a cloud provided by one or more commercial cloud service providers. The present invention relates to an installation system and an installation method and a distributed processing system for installing a distributed processing system for processing large-capacity data applicable to various clouds by copying an installation system to a client from an installation package that can be stored in various forms in a storage medium. .

클라우드(Cloud)는 소프트웨어와 데이터를 인터넷과 연결하여 중앙 컴퓨터에 저장하여 언제 어디서나 인터넷을 통하여 데이터를 이용할 수 있는 것을 말한다.Cloud refers to software and data connected to the Internet and stored on a central computer so that data can be used anytime, anywhere through the Internet.

클라우드는 여러형태로 나타내어지는데, 먼저 도입과 배포 형태에 따라 오직 하나의 단체를 위해서만 운영되는 프라이빗 클라우드(Private Cloud), 열린 네트워크를 통해 공개적으로 이용 할 수 있는 퍼블릭 클라우드(Public Cloud), 둘 이상의 클라우드가 함께 묶여 있는 조합을 가진 하이브리드 클라우드(Hybrid Cloud) 등으로 나눌 수 있다.Clouds come in many forms.First, depending on the type of introduction and distribution, a private cloud operated only for one group, a public cloud publicly available through an open network, and two or more clouds It can be divided into a hybrid cloud (Hybrid Cloud) with a combination that is bound together.

현재 클라우드 컴퓨팅 기술은 다양한 산업분야에서 적용되고 있으며 특히 대량의 데이터를 처리하기 위한 플랫폼으로서 클라우드 시스템이 적용되고 있다. 클라우드 기반 시스템은 기존 사내 시스템에 비해 필요한 컴퓨팅 자원을 필요한 시점에서 활용 가능하게 함으로써 시스템 가용성을 높일 수 있으며 특히, 사내 시스템에 종속적이지 않아 급변하는 외·내부 환경에 보다 유연하게 대처할 수 있다.Currently, cloud computing technology is applied in various industries, and a cloud system is applied as a platform for processing large amounts of data. A cloud-based system can increase system availability by making necessary computing resources available at a point in time compared to existing in-house systems, and in particular, it can be more flexible in coping with rapidly changing external and internal environments because it is not dependent on in-house systems.

또한, 클라우드 컴퓨팅 환경에서는 데이터 처리를 위해 사용한 컴퓨팅 리소스와 이용시간에 따라 비용이 발생한다. 따라서 사용자는 해당 클라우드 컴퓨팅 서비스를 사용한 만큼 비용을 지불한다. 하지만 상기 클라우드의 장점이 적용된 클라우드 기반의 대용량 데이터 처리용 분산 처리 시스템을 특정 클라우드에서 구축하기 위해서는 상당한 시간과 비용이 발생한다.In addition, in a cloud computing environment, costs are incurred depending on computing resources used and time of processing. Therefore, the user pays for using the cloud computing service. However, in order to build a cloud-based distributed processing system for processing large-capacity data in a specific cloud to which the advantages of the cloud are applied, significant time and cost are incurred.

대한민국 공개특허 제10-2014-0065545호(멀티 클라우드 배포 관리 시스템 및 방법, 공개일: 2015.12.09)Republic of Korea Patent Publication No. 10-2014-0065545 (multi-cloud distribution management system and method, publication date: 2015.12.09) 대한민국 공개특허 제10-2018-0063240호(클라우드 기반 컴퓨팅 환경에서 인프라스트럭처를 구축, 최적화, 및 시행하는 시스템 및 방법, 공개일: 2018.06.11)Republic of Korea Patent Publication No. 10-2018-0063240 (system and method for building, optimizing, and implementing infrastructure in a cloud-based computing environment, publication date: 2018.06.11) 대한민국 등록특허 제10-1822093 (클라우드 시스템 구축 장치 및 방법, 등록일: 2018.01.19)Republic of Korea Registered Patent No. 10-1822093 (Cloud system building device and method, registration date: 2018.01.19)

본 발명은 상술한 바와 같은 종래 기술의 문제점을 해결하고자 하는 것으로서, 특정 클라우드 시스템에 종속적이지 않으며 데이터 처리 수요 시점에 단시간 내에 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템 및 설치방법과 그 분산 처리 시스템을 제공하는 것이다.The present invention is to solve the problems of the prior art as described above, and is not dependent on a specific cloud system, and the installation system and installation method of a distributed processing system for processing large-capacity data applicable to various clouds within a short time at the time of data processing demand It is to provide the distributed processing system.

본 발명의 일 실시예에 따른 특정 클라우드 시스템에 종속적이지 않으며 데이터 처리 수요 시점에서 단시간 내에 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템 및 설치방법과 그 분산 처리 시스템으로서, As an installation system and installation method of a distributed processing system for large-capacity data processing that is not dependent on a specific cloud system according to an embodiment of the present invention and can be applied to various clouds within a short time at the time of data processing demand, and a distributed processing system thereof,

상기 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템은, 설치패키지로부터 클라이언트에 복사된 상기 분산 처리 시스템의 소스 코드 및 바이너리들의 빌드 및 컴파일을 수행할 때 필요한 참조 정보 파일을 포함하는 패키지 구성 정보부;와 상기 패키지 구성 정보부를 통해서 대용량 데이터 처리용 분산 처리 시스템에 대한 소스코드 및 바이너리를 빌드 및 컴파일을 위한 환경을 제공하는 컨테이너 생성 및 삭제와 상기 소스코드 및 바이너리들을 상기 컨테이너에서 빌드 및 컴파일을 실행하는 패키지 구동부;와 다양한 클라우드 환경에 맞게 분산 처리 시스템의 소스코드 및 바이너리들을 빌드 및 컴파일하는 환경을 갖는 컨테이너;와 상기 컨테이너 환경에서 분산 처리 시스템의 소스 코드 및 바이너리가 빌드 및 컴파일되어 생성되는 구동파일과 사용자에 의해 클라우드를 선택 및 요청하기 위한 사용자 인터페이스 수단을 제공하는 것을 더 포함하는 제어부;와 구동파일에 대한 소스 코드 및 바이너리들이 컨테이너에서 빌드 및 컴파일이 완료된 후 구동파일을 저장하는 패키지 저장부;와 상기 패키지 저장부 외에 외부에서 소스 코드 및 바이너리를 받을 수 있는 외부저장소;로 구성되어 포함된다.The installation system of a distributed processing system for processing large-capacity data applicable to the various clouds includes a package including reference information files required when building and compiling the source code and binaries of the distributed processing system copied from the installation package to the client. Configuration information unit; and container creation and deletion to provide an environment for building and compiling source code and binaries for a distributed processing system for processing large amounts of data through the package configuration information unit, and building and compiling the source codes and binaries in the container. A package driver for executing; and a container having an environment for building and compiling source codes and binaries of a distributed processing system for various cloud environments; and source code and binaries of a distributed processing system are built, compiled, and generated in the container environment A control unit further comprising providing a user interface means for selecting and requesting a drive file and a cloud by a user; and storing a package that stores the drive file after the source code and binaries for the drive file are built and compiled in the container. And a external storage that can receive source code and binaries from outside.

상기 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치방법은, 설치패키지로부터 설치시스템의 일부인 패키지 구성 정보부, 패키지 구동부, 제어부, 패키지 저장부 및 그 분산 처리 시스템에 대한 소스코드 및 바이너리 등 최초 설치시스템이 클라이언트에 복사되는 제 1단계,The installation method of a distributed processing system for processing large-capacity data applicable to the various clouds is the first of the package configuration information part, the package driving part, the control part, the package storage part, and the source code and binary for the distributed processing system from the installation package. The first step, where the installation system is copied to the client,

상기 최초 설치시스템을 통해 그 분산 처리시스템을 설치하기 위해서는 설치될 다양한 클라우드의 운영체제 환경 또는 호환 운영이 가능한 운영체제 환경과 빌드 및 컴파일을 하기 위한 환경을 가진 컨테이너를 생성하고 패키지 저장부에 복사된 상기 분산 처리 시스템에 대한 소스코드와 외부 필수 컴포넌트의 소스코드 및 바이너리들을 해당 컨테이너로 복사 또는 연결한 후 상기 컨테이너에서 상기 소스코드와 바이너리들을, 빌드, 컴파일, 압축하여 생성된 구동파일을 패키지 저장소로 전송한 후 컨테이너를 삭제한다. 이때 저장된 구동파일은 다양한 클라우드에 설치될 계산 노드 구동파일을 포함하는 마스터 노드 구동파일, 스토리지 노드 구동파일을 포함하는 제1 구동파일이 구비되는 제 2단계,In order to install the distributed processing system through the initial installation system, a container having an operating system environment for various cloud to be installed or an operating system environment capable of compatible operation and an environment for building and compiling is created and distributed to the package storage unit. After copying or linking the source code for the processing system and the source code and binaries of the external essential components to the corresponding container, the container creates, compiles and compresses the source code and binaries, and transmits the generated driving file to the package repository. Then delete the container. At this time, the stored drive file is a second step in which a master node drive file including a compute node drive file to be installed in various clouds and a first drive file including a storage node drive file are provided,

제어부에 포함된 사용자 인터페이스를 통해 클라우드를 선택하고 선택된 클라우드 컴퓨팅 환경에서 마스터 노드와 스토리지 노드를 할당하는 3단계,Step 3 to select the cloud through the user interface included in the control unit and allocate the master node and storage node in the selected cloud computing environment,

상기 선택된 클라우드를 통해 할당된 마스터 노드와 스토리지 노드에 마스터 시스템과 스토리지 시스템을 구비하기 위해 패키지 구동부는 마스터 노드와 스토리지 노드의 네트워크 정보 및 기타 구동에 필요한 정보를 제어부의 사용자 인터페이스를 통해 선택된 클라우드에 구비되는 마스터 시스템과 스토리지 시스템의 환경 설정 파일로 만들어 압축된 제1 구동파일을 압축 해제한 후 각각 마스터 구동 파일과 스토리지 구동파일의 특정 폴더에 복사한 후 압축하여 제2 구동파일을 구비하는 제 4단계,In order to provide the master system and the storage system to the master node and the storage node allocated through the selected cloud, the package driver provides network information of the master node and the storage node and other information necessary for operation in the cloud selected through the user interface of the controller. The fourth step of extracting the compressed first drive file, which is made of the configuration files of the master system and the storage system, and then copying them to a specific folder of the master drive file and the storage drive file and compressing them, respectively, to provide the second drive file. ,

4단계를 통해 생성된 제2 구동파일의 마스터 구동파일과 스토리지 구동파일을 마스터 노드와 스토리지 노드에 전송하고 압축 해제한 후 설치하여 마스터 시스템과 스토리지 시스템을 구비하여 제1 분산 처리 시스템을 구비하는 제 5단계,The master drive file and the storage drive file of the second drive file generated in step 4 are transferred to the master node and the storage node, decompressed, and installed to provide a master system and a storage system to provide a first distributed processing system. Step 5,

상기 마스터 시스템과 스토리지 시스템이 정상적으로 동작하고 있고, 사용자로부터 마스터 시스템에 데이터와 테스크가 입력될 때, 상기 데이터와 테스크를 처리할 수 있도록 계산 노드의 리소스와 계산 노드의 수를 할당하는 제 6단계, When the master system and the storage system are operating normally, and when data and tasks are input from the user to the master system, a sixth step of allocating resources of the compute node and the number of compute nodes to process the data and tasks,

제 6단계에서 마스터 시스템에 포함되는 계산 노드 구동 파일은 계산 노드에 전송한다. 전송된 계산 노드 구동파일은 압축 해제한 후 설치가 진행된다. 이때, 설치가 완료되었는지에 대한 정보를 제어부에 포함된 사용자 인터페이스를 통해 확인할 수 있으며, 계산 시스템이 구비가 되는 제 7단계,In step 6, the compute node driving file included in the master system is transmitted to the compute node. The transferred compute node drive file is decompressed and installation proceeds. At this time, information on whether the installation is completed can be checked through a user interface included in the control unit, and a seventh step in which a calculation system is provided,

상기 계산 시스템이 구비되고 최초 동작 시 계산 시스템은 저장된 마스터 시스템의 네트워크 정보를 통해 마스터 시스템에 성공 또는 실패 메세지를 보내고 마스터 시스템이 계산 시스템의 성공 메세지를 수신한 후 계산 시스템을 등록 또는 계산 시스템을 재설치를 수행하는 것으로 제 8단계를 포함한다.When the calculation system is provided and the first operation, the calculation system sends a success or failure message to the master system through network information of the stored master system, and after the master system receives the calculation system success message, registers the calculation system or reinstalls the calculation system. It includes the eighth step by performing.

다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템은, 상기 설치 방법과 설치 시스템을 통해 최초로 설치되는 하나 이상의 마스터 시스템과 스토리지 시스템을 구성하는 제1 분산 처리 시스템과 마스터 시스템과 사용자로부터 분석할 데이터와 테스크를 입력 받을 시 상기 데이터 중 전부 또는 일부 데이터를 처리하기 위해 생성된 하나 이상의 계산시스템과 처리된 데이터를 저장하는 하나 이상의 스토리지 시스템을 구성하는 제2 분산 처리 시스템을 포함한다. 상기 제2 분산 처리 시스템은 본 발명의 다양한 클라우드에 적용가능한 대용량 데이터 처리용 분산 처리 시스템이다.A distributed processing system for processing large-capacity data applicable to various clouds includes data to be analyzed from a first distributed processing system, a master system, and a user that constitutes one or more master systems and storage systems that are first installed through the installation method and installation system. And a second distributed processing system that configures one or more calculation systems generated to process all or part of the data when the task is input and one or more storage systems to store the processed data. The second distributed processing system is a distributed processing system for processing large amounts of data applicable to various clouds of the present invention.

본 발명에 의하면, 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템은 특정 클라우드(Cloud)에 한정하지 않고 운용할 수 있는 효과가 있으며, 대용량 데이터를 처리하고자 하는 시점에서 별도의 복잡한 설치 진행 과정이 없으며 대상 운영 환경에 맞게 커스터마이징의 작업이 필요 없이 즉각적으로 설치 및 운영이 가능하다.According to the present invention, a distributed processing system for processing large-capacity data applicable to various clouds has an effect that can be operated without being limited to a specific cloud, and a separate complicated installation process is performed at a time when large data is to be processed. There is no need for customizing according to the target operating environment, and it can be installed and operated immediately.

또한, 데이터 분석이 필요한 시점에서 본 발명의 분산 처리 시스템이 구비되어 있어 설치가 됨으로 종래의 기술들에 비해 분산 클러스터 시스템의 운영에 관여하는 시스템을 상시적으로 운영할 필요가 없는 점과 비용을 절감할 수 있는 효과가 있다.In addition, since the distributed processing system of the present invention is provided and installed at a point in time when data analysis is required, the point and cost of eliminating the need to constantly operate the system involved in the operation of the distributed cluster system is reduced compared to the conventional techniques. There is an effect that can be done.

또한 상기의 구비된 본 발명의 분산 처리 시스템은 마스터 노드, 스토리지 노드, 계산 노드만으로 구성하여 부가적인 관리시스템은 필요로 하지 않고 보다 더 분산 클러스터의 운영비를 절감할 수 있는 효과가 있다.In addition, the above-described distributed processing system of the present invention is composed of only a master node, a storage node, and a calculation node, and does not require an additional management system, and has the effect of further reducing the operation cost of the distributed cluster.

도 1은 본 발명의 일 실시예에 따른 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템과 그 분산 처리 시스템에 대한 전체 구성을 블록도로 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 구동파일에 대한 구성을 블록도로 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치방법에 대한 전체적인 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템 구성을 블록도로 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 소스코드 및 바이너리를 구성하는 패키지를 빌드 및 컴파일되는 순서를 구성한 도면이다.
도 5a 내지 5c는 본 발명의 일 실예에 따른 패키지에 대한 참조정보파일을 json 문법 형태로 표기한 도면이다.
도 6은 본 발명의 일 실시예에 따른 설치시스템을 통하여 제1 구동파일을 생성하는 방법을 나타내는 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 설치시스템을 통해 생성된 제1 구동파일을 통하여 그 분산 처리 시스템을 설치하기 위한 시스템 구성을 블록도로 도시한 도면이다.
도 8 내지 도 9는 본 발명의 일 실시예에 따른 제어부에 포함된 사용자 인터페이스 수단을 예시한 도면이다.
도 10은 본 발명의 일 실시예에 따른 설치시스템을 통해 생성된 제1 구동파일을 통해 제2 구동파일과 제1 분산 처리 시스템을 생성 및 설치하기 위한 시스템의 동작흐름을 도시한 흐름도이다.
도 11은 본 발명의 일 실시예에 따른 제2 분산 처리 시스템을 블록도로 도시한 도면이다.
도 12는 본 발명의 일 실시예에 따른 마스터 시스템에서 계산 시스템을 생성하는 동작흐름을 도시한 흐름도이다.
도 13 내지 도 16은 본 발명의 일 실시예에 따른 설치시스템을 통하여 다양한 클라우드에 설치되는 그 분산 처리 시스템이 적용되는 일 실시예를 도시한 도면이다.FIG. 1 is a block diagram showing an installation system of a distributed processing system for processing large amounts of data applicable to various clouds according to an embodiment of the present invention and an overall configuration of the distributed processing system.
2 is a block diagram showing the configuration of a drive file according to an embodiment of the present invention.
3 is an overall flow chart of a method for installing a distributed processing system for processing large amounts of data applicable to various clouds according to an embodiment of the present invention.
4 is a block diagram showing an installation system configuration of a distributed processing system for processing large amounts of data applicable to various clouds according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a sequence of building and compiling packages constituting source code and binaries according to an embodiment of the present invention.
5A to 5C are views illustrating a reference information file for a package according to an embodiment of the present invention in json grammar form.
6 is a flowchart illustrating a method of generating a first drive file through an installation system according to an embodiment of the present invention.
7 is a block diagram showing a system configuration for installing the distributed processing system through a first driving file generated through an installation system according to an embodiment of the present invention.
8 to 9 are views illustrating user interface means included in a control unit according to an embodiment of the present invention.
10 is a flowchart illustrating an operation flow of a system for generating and installing a second driving file and a first distributed processing system through a first driving file generated through an installation system according to an embodiment of the present invention.
11 is a block diagram of a second distributed processing system according to an embodiment of the present invention.
12 is a flowchart illustrating an operation flow for generating a calculation system in a master system according to an embodiment of the present invention.
13 to 16 are views illustrating an embodiment in which the distributed processing system installed in various clouds is applied through an installation system according to an embodiment of the present invention.

본 명세서 전체에서, 어떤 부분이 다른 부분과 '연결' 되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우 뿐만 아니라, 그 중간에 다른 소자를 두고 전기적으로 연결되어 있는 경우도 포함된다.Throughout this specification, when a part is'connected' to another part, this includes not only the case of being directly connected, but also the case of being electrically connected with another element in the middle.

또한, 본 명세서 전체에서 어떤 부분이 어떤 구성요소를 '포함' 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 본 명세서 전체에서 사용되는 정도의 용어 ~(하는) 단계 또는 ~의 단계는 ~를 위한 단계를 의미하지 않는다.In addition, when it is said that a certain part'includes' a certain component throughout the specification, this means that other components may be further included instead of excluding other components, unless otherwise stated. The term ~(below) or the level of ~ used in this specification does not mean a step for ~.

덧붙여, 본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시 되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 기술되고, 각 구성부 중 적어도 두 개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있다. 이러한 각 구성부의 통합된 실시예 및 분리된 실시예로 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리 범위에 포함된다.In addition, the components shown in the embodiments of the present invention are shown independently to indicate different characteristic functions, and do not mean that each component is composed of separate hardware or one software component unit. That is, for convenience of description, each component is listed as each component, and at least two components of each component may be combined to form one component, or one component may be divided into a plurality of components to perform a function. The integrated and separate embodiments of each of these components are included in the scope of the present invention without departing from the essence of the present invention.

본 발명의 바람직한 실시 예에 대하여 첨부된 도면을 참조하여 구체적으로 설명하되, 이미 주지된 기술적 부분에 대해서는 설명의 간결함을 위해 생략하거나 압축하기로 한다.With reference to the accompanying drawings for a preferred embodiment of the present invention will be described in detail, the already well-known technical parts will be omitted or compressed for the sake of brevity.

본 발명의 바람직한 실시예에 대하여 첨부된 도면을 참조하여 더 구체적으로 설명한다.With reference to the accompanying drawings, a preferred embodiment of the present invention will be described in more detail.

도 1은 본 발명의 일 실시예에 따른 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치시스템과 그 분산 처리 시스템에 대한 전체 구성을 블록도로 도시한 도면이다.FIG. 1 is a block diagram showing an installation system of a distributed processing system for processing large amounts of data applicable to various clouds according to an embodiment of the present invention and an overall configuration of the distributed processing system.

도 1에 도시된 바와 같이, 본 발명의 전체 구성은 설치패키지(100)와 클라이언트(200)와 다양한 클라우드(1000)를 포함하여 구성된다. 도 1은 하나의 설치패키지와 하나의 클라이언트와 하나 이상의 클라우드가 연결되는 구성을 도시하고 있으나, 이는 설명을 위한 예시에 불과하며 본 발명의 범위가 설치패키지 및 클라이언트와 클라우드의 개수로 한정되는 것은 아니다. As illustrated in FIG. 1, the entire configuration of the present invention includes an installation package 100, a client 200, and various clouds 1000. 1 illustrates a configuration in which one installation package and one client and one or more clouds are connected, but this is only an example for description and the scope of the present invention is not limited to the number of installation packages and clients and clouds. .

먼저, 설치 패키지(100)는 본 발명을 구현하기 위한 설치시스템을 클라이언트(200)에 복사하기 위한 특정한 프로그램이다. 더 구체적으로는, 흔히 사용되는 테스크 탑 PC 또는 서버 스테이션의 응용 프로그램 또는 웹 서비스 형태로 특정 소프트웨어의 소스의 복사, 빌드, 컴파일 등의 기능이 가능한 응용 프로그램 또는 서비스 일 수 있다.First, the installation package 100 is a specific program for copying the installation system for implementing the present invention to the client 200. More specifically, it may be an application or service capable of functions such as copying, building, and compiling the source of specific software in the form of a commonly used desktop PC or server station application or web service.

상기 클라이언트는(200) 설치패키지(100)로부터 설치시스템이 복사되며, 상기 설치시스템의 구성으로는 패키지 구성 정보부(210), 패키지 구동부(220), 패키지 저장부(230), 제어부(240), 컨테이너(250), 구동파일(260), 외부 저장소(270)를 포함한다. The installation system is copied from the installation package 100 of the client 200, and the configuration of the installation system includes a package configuration information unit 210, a package driving unit 220, a package storage unit 230, and a control unit 240, It includes a container 250, a driving file 260, and an external storage 270.

먼저 상기 패키지 구성 정보부(210)는 구동파일(260)에 대한 소스코드 및 바이너리를 빌드 및 컴파일을 수행하는 참조 정보 파일을 포함한다. 이때 참조 정보 파일에 기록된 정보는 본 발명의 분산 처리 시스템을 설치하기 위한 구동파일(260)을 이루는 패키지에 대한 정보를 포함하고 있다. 상기 정보는 패키지의 소스코드 및 바이너리에 대한 위치정보와 빌드 및 컴파일 전에 먼저 선행해서 빌드 또는 컴파일이 필요한 필수 패키지에 대한 정보 등을 포함하고 있다. 이때 위치 정보는 Github, ftp, http, 또는 원격·로컬 스토리지 위치 정보를 포함할 수 있으며 Github 주소일 경우는 버전에 따른 설치를 위해 브랜치(branch) 정보가 추가 될 수 있으며, 압축 파일형태로 원본 소스가 저장되어 있을 경우 부가적으로 정상적인 압축 해제를 확인하기 위한 추가적인 정보를 포함할 수 있다.First, the package configuration information unit 210 includes a reference information file that builds and compiles source codes and binaries for the driving file 260. At this time, the information recorded in the reference information file includes information on the package constituting the drive file 260 for installing the distributed processing system of the present invention. The above information includes location information of the package's source code and binaries, and information about essential packages that need to be built or compiled before building and compiling. At this time, the location information can include Github, ftp, http, or remote/local storage location information. In the case of a Github address, branch information can be added for installation according to the version. When is stored may additionally include additional information to confirm normal decompression.

상기 패키지 구동부(220)는 사용자의 요청으로부터 실행되며 다양한 클라우드에 설치할 구동파일(260)에 대한 소스 코드 및 바이너리를 패키지 구성 정보부(210)의 참조정보파일을 통해 읽어 패키지 저장부(230)에 저장되어 있는 패키지에 대한 소스코드 및 바이너리를 빌드 및 컴파일을 하기 위한 환경을 제공하는 컨테이너(250) 생성 및 삭제와 생성된 컨테이너(250)에서 소스코드 및 바이너리를 빌드 및 컴파일을 실행한다.The package driver 220 is executed from the request of the user and reads the source code and binaries for the drive file 260 to be installed in various clouds through the reference information file of the package configuration information unit 210 and stores it in the package storage unit 230 Create and delete the container 250 that provides an environment for building and compiling source code and binaries for the package, and build and compile the source code and binaries in the created container 250.

상기 패키지 저장부(230)는 구동파일(260)인 제1 구동파일(260)을 생성하기 위한 소스 코드 및 바이너리를 저장하고 그 소스코드 및 바이너리들이 빌드 및 컴파일이 완료된 구동파일(260)을 저장한다. 또한 사용자에 의해 선택한 클라우드에서 할당된 마스터 노드의 네트워크 정보와 스토리지 노드의 네트워크 정보를 특정한 환경 설정 파일로 변환하는 실행파일 등 특정파일들을 포함하며 그 특정파일을 통해서 생성되는 제2 구동파일을 생성하여 저장한다.The package storage unit 230 stores source code and binaries for generating the first drive file 260, which is the drive file 260, and stores the drive file 260 in which the source code and binaries are built and compiled. do. In addition, it includes specific files such as executable files that convert network information of the master node and network information of the storage node to a specific environment configuration file allocated from the cloud selected by the user, and generates a second drive file generated through the specific file. To save.

제어부(240)는 사용자에 의해 클라우드 선택 및 요청을 한다. 상기 클라우드 선택 및 요청은 디스플레이 장치(10)에서 제공하는 사용자 인터페이스(20)를 통해 사용자가 직접 클라우드를 선택할 수 있는 수단을 제공한다. 또한 제어부(240)는 다양한 클라우드 컴퓨팅 환경 정보를 포함하고 있다.The controller 240 makes a cloud selection and request by the user. The cloud selection and request provides a means for the user to directly select the cloud through the user interface 20 provided by the display device 10. In addition, the control unit 240 includes various cloud computing environment information.

컨테이너(250)는 필요시점에서 생성되어 다양한 클라우드에 설치되기 위한 구동파일(260)의 패키지에 대한 소스 코드 및 바이너리를 빌드 및 컴파일하기 위한 환경을 갖는다. 상기 컨테이너(250)에서 이루어지는 빌드 및 컴파일은 설치될 컴퓨팅 환경에 대한 환경 정보 없이 다양한 클라우드 환경에 적합하게 빌드 및 컴파일이 실행되어 작업이 종료되면 삭제된다. 상기 컨테이너(250)는 패키지 구동부(220)에 의해서 향후 종국적으로 설치될 수 있으며 컴퓨팅 환경과 동일하게 구성하게 되는데 이때 구성된 컴퓨팅 환경은 클라이이언트(200)에 물리적으로는 특정 공간을 차지하고 있지만 논리적, 시스템적으로는 격리되어 있는 환경이다.The container 250 has an environment for building and compiling source code and binaries for the package of the driving file 260 to be created at various times and installed in various clouds. The build and compile performed in the container 250 is deleted when the build and compile are executed for the various cloud environments without environment information on the computing environment to be installed, and when the operation ends. The container 250 may be installed in the future by the package driver 220 and configured in the same way as the computing environment. At this time, the configured computing environment physically occupies a specific space in the client 200, but is logical, system The enemy is an isolated environment.

구동파일(260)은 본 발명의 분산 처리 시스템을 설치하기 위함이며, 구동파일에 대한 패키지의 소스코드 및 바이너리가 빌드 및 컴파일이 완료되어 패키지 저장부에 저장된 결과물이다. 상기 구동파일(260)은 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000), 계산 노드 구동파일(5000)를 포함한다. 또한 구동파일(260)은 압축 파일로 존재 할 수 있으며, 여러 저장 매체에 저장될 수 있다. 단순히 해당 구동파일(260)인 마스터 노드 구동파일(4000), 스토리지 노드 구동파일(5000), 계산 노드 구동파일(6000)은 각각 할당된 노드에 설치되고 상호 연결될 때 필요한 환경설정 정보만을 포함하고 있지 않다.The driving file 260 is for installing the distributed processing system of the present invention, and the source code and binaries of the package for the driving file are the results of the build and compilation, and are stored in the package storage. The drive file 260 includes a master node drive file 3000, a storage node drive file 4000, and a compute node drive file 5000. In addition, the driving file 260 may exist as a compressed file and may be stored in various storage media. The corresponding drive file 260, the master node drive file 4000, the storage node drive file 5000, and the compute node drive file 6000 are each installed in the assigned node and contain only necessary configuration information when interconnected. not.

외부저장소(270)는 클라이언트(200) 외부에서 연결되어 이용된다. 외부저장소(270)는 컴파일 및 빌드가 필요한 패키지에 대한 소스 코드 및 바이너리가 패키지 저장부(230)에 없을 경우, 외부저장소(270)에서 소스 코드 및 바이너리를 패키지 저장부(230)에 저장하여 빌드 및 컴파일을 돕는 역할을 한다. 외부저장소는 웹사이트, 프로그램 등 형태가 될 수 있다.The external storage 270 is connected and used outside the client 200. The external storage 270 builds by storing the source code and binaries in the external storage 270 in the package storage 230 when the source code and binaries for the package requiring compilation and building are not in the package storage 230 And help compile. External storage can be in the form of websites, programs, etc.

상기 구성을 포함하는 설치시스템을 통해 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000), 계산 노드 구동파일(5000) 등을 포함한 제1 구동파일(260a)이 생성되고, 그 결과물인 상기 제1 구동파일(260a)을 생성하여 클라이언트(200)의 설치시스템인 패키지 저장부(230)에 저장된다. The first drive file 260a including the master node drive file 3000, the storage node drive file 4000, and the compute node drive file 5000 is generated through the installation system including the above configuration, and the result is the The first driving file 260a is generated and stored in the package storage unit 230 which is an installation system of the client 200.

그 후 사용자의 필요시점에서 본 발명의 구현에 위한 설치패키지(100)는 단일 동작 프로그램 또는 시스템을 설치하는게 아니라 한 개 이상의 프로그램 또는 시스템을 설치하고 서로 연결시킨 후 특정한 기능의 수행이 가능한 분산 처리 시스템을 설치하는 기능을 추가적으로 가지고 있다.Thereafter, the installation package 100 for the implementation of the present invention at a user's need is not a single operation program or system, but a distributed processing system capable of performing a specific function after installing one or more programs or systems and connecting them to each other. It has additional features to install.

상기 설치시스템을 통하여 다양한 클라우드에 설치될 분산 처리 시스템에 있어서, 클라우드 환경은 물리적인 시스템의 위치에 따라 퍼블릭 클라우드(Public Cloud), 프라이빗 클라우드(Private Cloud), 하이브리드 클라우드(Hybrid Cloud) 또는 한 종류 이상의 상업적 클라우드 서비스 업체가 제공하는 클라우드를 포함 할 수 있다. 상기 분산 처리 시스템은 설치시스템을 통하여 클라우드에 최초로 설치될 하나 이상의 마스터 시스템(300)과 분석할 데이터가 입력될 시 상기 마스터 시스템(300)으로부터 데이터 중 전부 또는 일부 데이터를 처리하는 하나 이상의 계산 시스템(500)과 처리된 데이터를 저장하는 하나 이상의 스토리지 시스템(400)을 포함한다.In the distributed processing system to be installed in various clouds through the installation system, the cloud environment is a public cloud, a private cloud, a hybrid cloud, or one or more types depending on the physical system location. It may include a cloud provided by a commercial cloud service provider. The distributed processing system includes one or more master systems 300 to be initially installed in the cloud through an installation system, and one or more calculation systems that process all or part of data from the master system 300 when data to be analyzed is input ( 500) and one or more storage systems 400 for storing processed data.

상기 마스터 시스템(300)은 설치시스템으로부터 생성된 제1 구동파일(260a)을 통해 최초로 클라우드에 설치되는 본 발명의 분산 처리 시스템의 일부이며, 제1 구동파일(260a)인 마스터 노드 구동파일(3000)은 계산 노드 구동파일(5000)을 포함하며 클라우드에 할당된 노드에 설치된 후 데이터 및 테스크가 사용자 요청에 의하여 입력될 때 계산 노드의 수 와 계산 노드의 리소스를 계산하여 계산 노드(500)를 할당한다.The master system 300 is a part of the distributed processing system of the present invention that is first installed in the cloud through the first drive file 260a generated from the installation system, and the master node drive file 3000 which is the first drive file 260a ) Includes the compute node drive file 5000 and allocates the compute node 500 by calculating the number of compute nodes and the resources of the compute node when data and tasks are inputted by user requests after being installed in the node allocated to the cloud. do.

스토리지 시스템(400)은 상기 마스터 노드 구동파일(3000)이 클라우드에 전송되는 시점에서 전송되어 설치가 된다. 상기 설치된 스토리지 시스템(4000)은 계산 시스템(500)에서 처리된 데이터와 데이터 분석에 필요한 정보 등을 포함하고 있다.The storage system 400 is transmitted and installed when the master node driving file 3000 is transmitted to the cloud. The installed storage system 4000 includes data processed by the calculation system 500 and information necessary for data analysis.

상기 계산 시스템(500)은 마스터 시스템(300)과 스토리지 시스템(400)이 정상적으로 동작하고 사용자로부터 데이터 및 테스크가 입력될 때 마스터 시스템(300)로부터 생성된다. 계산 시스템(500)이 설치되는 시점에서 마스터 시스템(300)의 네트워크 정보를 포함한다. 또한 마스터 시스템(300)에게 설치한 자신의 계산 시스템(300)에 대한 네트워크 정보를 해당 마스터 시스템(300)의 네트워크 정보를 통해 등록하여 연결한다.The calculation system 500 is generated from the master system 300 when the master system 300 and the storage system 400 operate normally and data and tasks are input from a user. At the time when the calculation system 500 is installed, it includes network information of the master system 300. In addition, the network information of the own calculation system 300 installed in the master system 300 is registered and connected through the network information of the corresponding master system 300.

도 2는 본 발명의 구동파일(260)에 대한 구성을 블록도로 도시한 도면이다.2 is a block diagram showing the configuration of the drive file 260 of the present invention.

본 발명의 다양한 클라우드에 적용 가능한 대용량 데이터 분산 처리 시스템을 설치하기 위한 구동 시스템(260)은 마스터 노드 구동파일(3000), 계산 노드 구동파일(4000), 스토리지 노드 구동파일(5000)을 포함하며, 상기 설치시스템인 컨테이너(250)에서 구동파일(260)에 대한 패키지의 소스 코드 및 바이너리를 빌드 및 컴파일을 통해 최초 구동파일(260)이 생성된다. 또한 최초로 생성된 구동파일(260)은 제1 구동파일(260a)로 정의할 수 있다. 상기 구동파일(260)의 마스터 노드 구동파일(3000)은 계산 노드 구동파일(5000)을 포함하며 그 밖의 계산 시스템(500)을 설치하기 위한 시스템 및 정보로 구성되어 있다. 계산 노드 구동파일(5000)은 마스터 시스템(300)으로 포함되며 선택된 클라우드에 할당된 노드에 데이터 및 테스크를 처리하는 시스템으로 설치된다. 스토리지 노드 구동파일(4000)은 계산 시스템(600)에서 분석할 데이터와 분석에 필요한 데이터를 저장할 수 있는 시스템으로 설치된다. The drive system 260 for installing a large-capacity data distribution processing system applicable to various clouds of the present invention includes a master node drive file 3000, a compute node drive file 4000, and a storage node drive file 5000, The first driving file 260 is generated by building and compiling the source code and binary of the package for the driving file 260 in the installation system container 250. Also, the first generated driving file 260 may be defined as a first driving file 260a. The master node drive file 3000 of the drive file 260 includes a calculation node drive file 5000 and is composed of systems and information for installing other calculation systems 500. The compute node driving file 5000 is included as the master system 300 and is installed as a system for processing data and tasks on nodes allocated to the selected cloud. The storage node driving file 4000 is installed as a system capable of storing data to be analyzed and data required for analysis in the calculation system 600.

상기 구동파일(260)은 일 실시예를 통하여 분산 처리 시스템이 설치진행됨에 따라 포함하는 정보와 형태에 따라 제1 구동파일(260a), 제2 구동파일(260b)로 정의될 수 있다.The driving file 260 may be defined as a first driving file 260a and a second driving file 260b according to information and forms included as the distributed processing system is installed according to an embodiment.

도 3은 본 발명의 일 실시예에 있어서, 다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템의 설치방법에 대한 전체적인 흐름도이다.3 is an overall flow diagram of a method of installing a distributed processing system for processing large amounts of data applicable to various clouds in an embodiment of the present invention.

다양한 클라우드에 적용 가능한 대용량 데이터 처리용 분산 처리 시스템을 설치하는 방법에 있어서, 먼저 설치패키지(100)로부터 클라이언트(200)에 설치시스템이 복사된다.(S100) In a method of installing a distributed processing system for processing large-capacity data applicable to various clouds, the installation system is first copied from the installation package 100 to the client 200. (S100)

상기 설치패키지(100)로부터 복사된 설치시스템의 패키지 저장부(230)는 마스터 시스템(300), 스토리지 시스템(400), 계산 시스템(500) 및 그 밖의 상기 분산 처리 시스템의 구성요소에 대한 소스 코드와 외부 필수 컴포넌트의 소스 및 바이너리를 포함한다. 상기 분산 처리 시스템이 설치될 다양한 클라우드의 운영체제 환경 또는 호환 운영이 가능한 운영체제 환경에 적합한 컨테이너(250)를 생성하고 클라이언트(200)의 복사된 상기 소스코드와 외부 필수 컴포넌트의 소스 및 바이너리들을 해당 컨테이너(250)로 복사 또는 연결한 후 빌드, 컴파일, 압축을 진행하여 패키지 저장부(230)에 저장한 후 해당 컨테이너(250)를 삭제한다. 클라우드에 할당된 마스터 노드(301)와 스토리지 노드(401)에 설치하기 위한 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000)인 제1 구동파일(260a)을 생성하여 패키지 저장부(230)에 저장된다.(S200)The package storage unit 230 of the installation system copied from the installation package 100 is source code for a master system 300, a storage system 400, a calculation system 500, and other components of the distributed processing system. And source and binary of external essential components. Create a container 250 suitable for an operating system environment or compatible operating system environment of various clouds in which the distributed processing system is to be installed, and copy the source code of the client 200 and sources and binaries of external essential components to the corresponding container ( After copying or connecting to 250), build, compile, and compress it, store it in the package storage unit 230, and then delete the corresponding container 250. The master node 301 allocated to the cloud and the master node driving file 3000 for installation on the storage node 401 and the first driving file 260a which is the storage node driving file 4000 are generated to generate a package storage unit 230 ).(S200)

사용자에게 제공된 사용자 인터페이스(20)를 통해 클라우드를 선택하고 선택된 클라우드 컴퓨팅 환경에서 마스터 노드(300)와 스토리지 노드(400) 역할을 할 노드를 할당하는 수단을 제공한다.(S300)It provides a means for selecting a cloud through a user interface 20 provided to a user and allocating nodes to serve as the master node 300 and the storage node 400 in the selected cloud computing environment. (S300)

상기 클라우드에 각 설치될 역할에 맞게 할당된 노드에 마스터 노드 구동파일(3000)을 포함한 제1 구동파일(260a)을 설치하기 위해서, 제1 구동파일(260a)은 상기 클라우드에 할당된 마스터 노드의 네트워크 정보와 스토리지 노드의 네트워크 정보를 사용할 환경 설정 파일로 만들고 압축 해제한 후 해당 환경 설정 파일을 주입하여 제2 구동파일을 생성한다. 이때 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000), 계산 노드 구동파일(5000) 등에 주입되는 네트워크 정보는 마스터 노드 IP, 마스터 노드의 네트워크 인터페이스, 마스터 노드의 SSH 포트 정보 등이 포함될 수 있다.(S400)In order to install a first drive file 260a including a master node drive file 3000 to a node allocated to each role to be installed in the cloud, the first drive file 260a is a master node assigned to the cloud. The network information and the network information of the storage node are made into a configuration file to be used, extracted, and then the corresponding configuration file is injected to generate a second drive file. At this time, the network information injected into the master node drive file 3000, the storage node drive file 4000, and the compute node drive file 5000 may include a master node IP, a master node network interface, and a master node SSH port information. Yes (S400)

상기 방법에서 할당한 마스터 노드(301)와 스토리지 노드(401)는, 선택된 클라우드에 설치하기 위하여 실행파일들과 환경 설정 파일들이 주입되어 압축된 제2 구동파일(260b)인 마스터 노드 구동파일(3000)과 스토리지 노드 구동파일(4000)을 각각 전송한다. 할당된 노드에서 압축 해제한 제2 구동파일인 마스터 노드 구동파일(3000)과 스토리지 노드 구동파일(4000)을 각각의 노드에서 서비스 등록 및 정상 작동을 확인하여 제1 분산 처리 시스템이 구비된다.(S500)The master node 301 and the storage node 401 allocated in the above method are the master node drive file 3000 which is the second drive file 260b compressed by injecting executable files and environment setting files to install in the selected cloud. ) And the storage node driving file 4000 respectively. The first distributed processing system is provided by checking the service registration and normal operation of the master node driving file 3000 and the storage node driving file 4000, which are the second driving files extracted from the assigned node, from each node.( S500)

상기 마스터 시스템(300)과 스토리지 시스템(400)이 정상적으로 동작하고 있고, 사용자로부터 데이터와 테스크가 입력될 때, 마스터 시스템(400)이 입력된 데이터와 테스크를 적절히 처리할수 있도록 계산 노드(501)의 리소스와 계산 시스템(500)의 역할을 할 계산 노드(501)의 수를 할당한다.(S600)When the master system 300 and the storage system 400 are operating normally, and data and tasks are input from a user, the calculation node 501 is configured so that the master system 400 can properly process the input data and tasks. The number of the calculation node 501 to serve as the resource and the calculation system 500 is allocated. (S600)

상기 방법을 통해 계산 노드(501n)를 할당한 후 마스터 시스템(300)으로부터 계산 노드 구동파일(5000)을 결정하고 계산 노드(501n)에 계산 노드 구동파일(5000)을 전송하여 계산 시스템(500)을 등록하고 정상동작을 확인한다. 이때, 설치되는 계산 노드 구동파일(5000)에는 마스터 시스템(300)의 네트워크 정보를 포함한다.(S700)After allocating the calculation node 501n through the above method, the calculation system 500 is determined by determining the calculation node driving file 5000 from the master system 300 and transmitting the calculation node driving file 5000 to the calculation node 501n. Register and check normal operation. At this time, the calculated node driving file 5000 includes network information of the master system 300. (S700)

상기 계산 시스템(500)이 설치되고 동작시 최초로 계산 시스템(500) 정보가 저장된 마스터 시스템(300)의 네트워크 정보를 통해서 마스터 시스템(300)에게 자신의 계산 시스템(600) 정보를 등록하는 단계를 포함할 수 있다.(S800)When the calculation system 500 is installed and in operation, registering the calculation system 600 information to the master system 300 through the network information of the master system 300 in which the calculation system 500 information is stored for the first time. I can do it.(S800)

도 4는 본 발명의 일 실시예에 있어서, 다양한 클라우드에 적용 가능한 분산 처리 시스템의 설치시스템을 도시한 도면이다.4 is a diagram illustrating an installation system of a distributed processing system applicable to various clouds in an embodiment of the present invention.

도 5 내지 도 5b는 본 발명의 일 실시예에 따른 구동파일(260)의 마스터 노드 구동파일(3000)에 대한 패키지들이 컨테이너(250)에서 빌드 및 컴파일 되는 순서와 패키지들에 대한 참조 정보 파일을 스크립트 파일 형태로 도시한 도면이다.5 to 5B illustrate the order in which packages for the master node drive file 3000 of the drive file 260 according to an embodiment of the present invention are built and compiled in the container 250, and reference information files for the packages. It is a diagram shown in the form of a script file.

도 5를 참조하면, 객체로 표시한 A, B, C는 마스터 노드 구동파일(3000)을 이루는 패키지A, 패키지B, 패키지C를 말한다. 마스터 노드 구동파일(3000)을 이루는 패키지들은 프로그램 및 명령어를 통해 패키지에 대한 참조 정보 파일을 읽는다. 읽은 후 패키지에 대한 소스 코드 및 바이너리는 참조 정보 파일을 통하여 패키지C, 패키지B, 패키지A 순으로 빌드 및 컴파일이 이루어질 수 있다.Referring to FIG. 5, A, B, and C marked as objects refer to Package A, Package B, and Package C constituting the master node driving file 3000. The packages constituting the master node driving file 3000 read a reference information file for the package through programs and commands. After reading, the source code and binary for the package can be built and compiled in the order of Package C, Package B, and Package A through the reference information file.

도 5a 내지 도 5c는 패키지에 대한 참조 정보 파일 정보를 일 시예에 따라 도시한 도면이다. 5A to 5C are diagrams illustrating reference information file information for a package according to an example.

도 5a를 참조하면 참조 정보 파일의 정보구성으로는 패키지명, 필수 패키지명, 위치정보, 버전정보, 실행파일 위치정보, 컨테이너파일 위치정보를 포함하고 있다.Referring to FIG. 5A, the information configuration of the reference information file includes package name, required package name, location information, version information, executable file location information, and container file location information.

먼저 패키지명은 해당 참조 정보 파일에 대한 패키지 이름을 표시한다. First, the package name indicates the package name for the reference information file.

필수 패키지명은 해당 패키지를 빌드 및 컴파일 하기 전에 먼저 선행되어야 할 패키지를 표시한다. The required package name indicates the package that must be preceded before building and compiling the package.

위치정보는 해당 패키지에 대한 소스코드 및 바이너리에 대한 위치정보를 표시한다. The location information displays location information about the source code and binaries for the package.

상기 패키지에 대한 위치정보는 내부 외부로 구분하여 표시될 수 있다. The location information for the package may be displayed as being divided into inside and outside.

버전정보는 해당 패키지에 대한 버전을 표시한다. The version information indicates the version of the package.

실행파일 위치정보는 컨테이너(250)를 실행하기 위한 파일에 위치에 대한 정보를 표시한다.The executable file location information displays information about the location in the file for executing the container 250.

컨테이너파일 위치정보는 패키지에 대한 소스코드 및 바이너리들을 어떤 환경으로 컨테이너(250)를 구성하기 위한 파일에 대한 위치를 표시한다.The container file location information indicates the location of a file for configuring the container 250 in a certain environment with source codes and binaries for the package.

상기 패키지A에 대한 참조 정보 파일을 통해 패키지B와 패키지C도 다음과 같이 구성될 수 있다. Package B and Package C may also be configured as follows through the reference information file for Package A.

도 6은 도4 내지 도 5b에서 도시한 도면을 참조하여, 도 6에서 제1 구동파일(260)을 생성하기 위한 전체동작흐름을 설명한다. 또한 도 6에서 도시한 전체동작흐름도는 도 3의 단계 S100 내지 단계 S200에 포함된다.6 illustrates the overall operation flow for generating the first driving file 260 in FIG. 6 with reference to the drawings illustrated in FIGS. 4 to 5B. Also, the overall operation flow diagram shown in FIG. 6 is included in steps S100 to S200 of FIG. 3.

먼저 설치패키지(100)로부터 클라이언트(200)에 복사된 설치시스템(S100)인 패키지 구동부(220)가 사용자 요청에 의하여 실행되면(S10), 패키지 구동부(220)는 상기 패키지 구성 정보부(210)를 통해서 마스터 노드 구동파일(3000)을 이루는 패키지A, 패키지B, 패키지C에 대한 참조 정보 파일을 읽는다.(S11)First, when the package driver 220, which is the installation system S100 copied from the installation package 100 to the client 200, is executed by a user request (S10), the package driver 220 receives the package configuration information 210. Read the reference information files for Package A, Package B, and Package C constituting the master node driving file 3000 (S11).

제1 구동파일(260a)인 마스터 노드 구동파일(3000)을 생성하기 위한 패키지A의 참조 정보 파일에 필수 패키지인 패키지B가 존재할 경우, 다시 패키지 구동부(220)를 통하여 먼저 선행 빌드 및 컴파일이 필요한 필수 패키지인 패키지B에 대한 소스 코드 및 바이너리 위치 정보 등이 포함된 참조 정보 파일을 읽는다. 이 때 패키지B의 참조 정보 파일에 필수 패키지가 계속 존재할 경우 상기 방법을 통한 재귀구문 또는 루프귀문 형태로 진행되어 패키지C를 읽는다. 이후 패키지C의 참조 정보 파일을 읽어 패키지C의 참조 정보 파일에 필수 패키지가 존재하지 않는 경우,상기 재귀구문 또는 루프구문이 종료되고 패키지 저장부(230)에서 필수 패키지에 대한 소스 코드 및 바이너리를 이용하여 빌드 및 컴파일을 준비한다. 이때 소스 코드 및 바이너리가 패키지 저장부(230)에 존재하지 않을 경우 외부 저장소(300)에서 다운을 받아 패키지 저장부(230)에 저장한다.(S12)When the required package B exists in the reference information file of the package A for generating the master node drive file 3000, which is the first drive file 260a, prior build and compilation are first required through the package driver 220 again. Read the reference information file including the source code and binary location information for the required package B. At this time, if the required package continues to exist in the reference information file of package B, it proceeds in the form of recursive or loop loop through the above method to read package C. After reading the reference information file of package C, if the required package does not exist in the reference information file of package C, the recursive syntax or loop syntax ends and the package storage unit 230 uses the source code and binary for the required package. Prepare to build and compile. At this time, if the source code and the binary do not exist in the package storage 230, it is downloaded from the external storage 300 and stored in the package storage 230. (S12)

상기 패키지C의 참조 정보 파일에서 선행 필수 패키지가 없을 경우 참조 정보 파일에 컨테이너파일 위치정보를 읽어 소스코드 및 바이너리를 빌드 및 컴파일 하기 위한 컨테이너(250)를 생성한다. 이 후 컨테이너(250)를 빌드 및 컴파일을 실행하기 위하여 참조 정보 파일에 실행파일 위치정보를 읽어, 패키지 저장부(230)에 저장되어 있는 실행파일을 컨테이너(250)와 연결한다. 이 때 패키지 저장부(230)와 컨테이너(250)는 연결된다.(S13)If there is no preceding required package in the reference information file of the package C, the container file for building and compiling the source code and binary is generated by reading the container file location information in the reference information file. Thereafter, in order to build and compile the container 250, the location information of the executable file is read in the reference information file, and the executable file stored in the package storage unit 230 is connected to the container 250. At this time, the package storage unit 230 and the container 250 are connected. (S13)

상기 컨테이너(250)가 준비 완료되면 패키지C에 대한 참조 정보 파일을 읽어 소스코드 및 바이너리에 대한 위치정보를 읽는다. 위치정보는 내부,외부 위치정보로 나누어진다. 내부 위치 정보는 패키지 저장부(230)를 포함한 클라이언트(200)에 저장될 수 있는 정보를 포함하고 있다.. 외부 위치 정보는 외부 저장소(270) 및 외부 저장매체로부터 다운을 받아 패키지 저장부(230)에 저장될 수 있는 정보를 포함하고 있다. 이후 패키지 저장부(230)에 준비가 완료되면, 패키지C에 대한 소스코드 및 바이너리가 컨테이너(250)로 전송(S14)되어 컴파일 및 빌드가 패키지 저장부(230)에 저장된 실행파일을 통해 진행된다.(S15)When the container 250 is ready, the reference information file for the package C is read to read the location information for the source code and binary. Location information is divided into internal and external location information. The internal location information includes information that can be stored in the client 200 including the package storage unit 230. The external location information is downloaded from the external storage 270 and an external storage medium, and the package storage unit 230 ). Subsequently, when the package storage unit 230 is ready, the source code and the binary for the package C are transferred to the container 250 (S14), and compilation and build are performed through the executable file stored in the package storage unit 230. .(S15)

이후 컴파일 및 빌드가 완료된 패키지C는 패키지 저장부(230)로 저장된다. 이후 빌드 및 컴파일이 완료된 컨테이너(250)는 삭제된다.(S16)Thereafter, the compiled and built package C is stored in the package storage 230. Thereafter, the container 250, which has been built and compiled, is deleted. (S16)

상기 패키지C는 빌드 및 컴파일이 완료되어 패키지 저장부(230)에 저장되고 컨테이너(250)가 삭제된 후 마스터 노드 구동파일(3000)을 이루는 패키지B에 대한 소스코드 및 바이너리를 재귀구문 또는 루프구문을 통해 빌드 및 컴파일 진행한다. 패키지A도 상기 패키지B가 컴파일 및 빌드가 완료된 후에 상기 재귀구문 또는 루프구문을 통해 진행된다. 패키지A, 패키지B, 패키지C가 빌드 및 컴파일이 완료되면 마스터 노드 구동파일(3000)으로 패키지 저장부(230)에 저장된다. 스토리지 노드 구동파일(4000)이나 계산 노드 구동파일(5000)은 상기 마스터 노드 구동파일(3000)과 같은 단계를 통해서 생성된다. 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000), 계산 노드 구동파일(5000) 등이 완료되어 더 이상 빌드 및 컴파일 할 패키지가 없을 경우(S17), 본 발명의 분산 처리 시스템을 구성하는 제1 구동파일(260a)이 생성(S18)된다.(S200)The package C has been built and compiled, stored in the package storage unit 230, and the container 250 is deleted, and then the source code and the binary for the package B constituting the master node driving file 3000 are recursive syntax or loop syntax. Build and compile through. Package A also proceeds through the recursive or loop syntax after the package B is compiled and built. When the package A, the package B, and the package C have been built and compiled, they are stored in the package storage unit 230 as a master node driving file 3000. The storage node drive file 4000 or the compute node drive file 5000 is generated through the same steps as the master node drive file 3000. When the master node drive file 3000, the storage node drive file 4000, and the compute node drive file 5000 are completed and there are no more packages to build and compile (S17), constituting the distributed processing system of the present invention The first driving file 260a is generated (S18). (S200)

도 7은 본 발명의 일 실시예에 있어서, 설치시스템에 의해서 생성된 제1 구동파일(260a)을 통해 제1 분산 처리 시스템(260c)을 설치하기 위한 시스템 구성을 도시한 도면이다7 is a diagram illustrating a system configuration for installing the first distributed processing system 260c through the first drive file 260a generated by the installation system in one embodiment of the present invention.

도 8 내지 도 9는 본 발명의 사용자 인터페이스(20)의 예시들을 나타내는 도면이다.8 to 9 are views showing examples of the user interface 20 of the present invention.

도 8은 클라우드를 선택하기 위한 제1 사용자 인터페이스(20a)에 대한 수단이다.8 is a means for a first user interface 20a for selecting a cloud.

상기 제1 사용자 인터페이스(20a)를 통해 클라우드(1000)를 선택하는 방법은, 디스플레이 장치(10)를 통해 복수의 클라우드(1000)가 선택객체로 표시되는 선택목록(30)인 제1 사용자 인터페이스(20a)를 포함한다. 또한 선택목록(30)은 라디오 버튼, 파일선택형식, 텍스트 등 중 적어도 하나 이상의 선택수단을 사용자에게 제공되는 제1 사용자 인터페이스(20a) 형태를 가질 수 있다. 도 8에서 나타내는 클라우드를 선택하는 수단은 상기 선택수단 중 하나이다. 이후 상기 클라우드(1000) 선택이 완료되면 선택된 클라우드에서 사용할 노드에 대한 정보 확인 및 수정하는 제2 사용자 인터페이스(20b)를 제공한다.The method for selecting the cloud 1000 through the first user interface 20a includes: a first user interface that is a selection list 30 in which a plurality of clouds 1000 are displayed as selection objects through the display device 10 ( 20a). In addition, the selection list 30 may have a form of a first user interface 20a in which at least one selection means among radio buttons, file selection formats, texts, and the like is provided to the user. The means for selecting the cloud shown in FIG. 8 is one of the selection means. Thereafter, when the selection of the cloud 1000 is completed, a second user interface 20b for checking and correcting information about a node to be used in the selected cloud is provided.

도 9는 할당된 노드에 대한 정보를 확인 및 수정하는 제2 사용자 인터페이스(20b) 수단이다.9 is a second user interface 20b means for checking and correcting information about the assigned node.

상기 클라우드(1000) 선택 단계가 완료되면, 선택된 클라우드에서 할당된 노드에 대한 정보를 제2 사용자 인터페이스(20b)를 통해 제공된다. 할당된 노드에 대한 정보는 제2 사용자 인터페이스(20b)를 통해 복수로 표시될 수 있으며, 선택된 클라우드(1000)에 할당된 노드에 대한 정보를 표시하는 노드설정(40)과 할당된 노드에 대한 환경 및 정보를 확인하고 제1 사용자 인터페이스(20a)거쳐 생성된 제2 구동파일(260b)을 할당된 노드에 설치하기 위한 설치버튼(50)으로 포함된다. 상기 설치버튼(50)은 노드설정(40)과 같이 제2 사용자 인터페이스(20b)에 구성될 수 있다.When the step of selecting the cloud 1000 is completed, information on a node allocated from the selected cloud is provided through the second user interface 20b. Information about the assigned node may be displayed in a plurality through the second user interface 20b, node setting 40 displaying information about the node assigned to the selected cloud 1000 and the environment for the assigned node. And an installation button 50 for checking information and installing the second driving file 260b generated through the first user interface 20a to the assigned node. The installation button 50 may be configured in the second user interface 20b as in the node setting 40.

또한 제2 사용자 인터페이스(20b)는 일 실시예에 따른 클라우드를 선택하는 제1 사용자 인터페이스(20a) 영역에서 표시되어 구성될 수 있다.Also, the second user interface 20b may be displayed and configured in an area of the first user interface 20a for selecting a cloud according to an embodiment.

도 10은 본 발명의 일 실시예에 있어서, 설치시스템에 의해서 생성된 제1 구동파일(260a)을 통해 선택한 클라우드에 제1 분산 처리 시스템(260c)을 설치하기 위한 시스템의 동작흐름을 도시한 흐름도이다.10 is a flowchart illustrating an operation flow of a system for installing a first distributed processing system 260c in a cloud selected through a first drive file 260a generated by an installation system in an embodiment of the present invention. to be.

도 10은 도 7 내지 도 9를 참조하여, 도 10에 대한 전체동작흐름을 설명하며,도 3의 단계 S300 내지 단계 S500으로 포함된다.FIG. 10 describes the overall operation flow of FIG. 10 with reference to FIGS. 7 to 9, and is included as steps S300 to S500 of FIG. 3.

단계 S200에서 설치시스템을 통해 생성된 마스터 노드 구동파일(3000)을 포함한 제1 구동파일(260a)을 다양한 클라우드에 설치되기 위해서는, 상기 설치시스템인 제어부(240)는 디스플레이 장치에서 제공되는 제1 사용자 인터페이스(20a)에서 복수의 클라우드를 표시하는 선택목록(30)을 통해 사용자에 요청에 의해 클라우드를 선택하고(S20) 선택된 클라우드 컴퓨팅 환경에서 마스터 노드(300)와 스토리지 노드(400) 역할을 할 노드를 선택된 클라우드에서 제공하는 API를 통해 할당한다. 이 때 할당되는 노드는 마스터 노드 구동파일(3000)을 포함한 제1 구동파일(260a)은 최적화된 리소스 사양에 맞게 선택한 클라우드에 할당된다. 이후 제어부(240)는 선택한 클라우드에 대한 컴퓨팅 환경 및 네트워크 정보를 전송받는다.(S21)(S300) In order to install the first drive file 260a including the master node drive file 3000 generated through the installation system in various clouds in step S200 in various clouds, the installation system control unit 240 is a first user provided in the display device A node to serve as a master node 300 and a storage node 400 in the selected cloud computing environment by selecting a cloud by request to the user through a selection list 30 displaying a plurality of clouds in the interface 20a (S20). Is allocated through the API provided by the selected cloud. At this time, the allocated node is the first drive file 260a including the master node drive file 3000 and is allocated to the cloud selected according to the optimized resource specification. Thereafter, the control unit 240 receives the computing environment and network information for the selected cloud. (S21) (S300)

전송된 정보는 패키지 구동부(220)를 통하여 패키지 저장부(230)에 저장되어 있는 제1 구동파일(260a)에 각각의 포맷에 맞게 환경설정파일로 만들어지고, 만들어진 환경설정파일은 제1 구동파일(260a)을 압축 해제하여 각각 폴더로 이루어질 때, 특정 폴더로 이동(S22)되어 다시 압축되어 제2 구동파일(260b)을 생성한다.(S23)(S400) 이때 제1 구동파일(260a)인 마스터 노드 구동파일(3000), 스토리지 노드 구동파일(4000), 계산 노드 구동파일(5000)에 주입되는 환경설정파일은 마스터 노드 IP, 마스터 노드의 네트워크 인터페이스, 마스터 노드의 SSH 포트 정보 등이 포함될 수 있다. The transmitted information is created as an environment setting file according to each format in the first drive file 260a stored in the package storage unit 230 through the package driver 220, and the created environment setting file is the first drive file When the 260a is decompressed and made into each folder, it is moved to a specific folder (S22) and compressed again to generate the second drive file 260b. (S23) (S400) At this time, the first drive file 260a is The configuration file injected into the master node drive file 3000, the storage node drive file 4000, and the compute node drive file 5000 may include a master node IP, a master node network interface, and a master node SSH port information. have.

상기 단계 S23에서 환경설정파일들이 주입되어 압축된 제2 구동파일(260b)은 패키지 저장부(230)에 저장되고, 제2 구동파일(260b)인 마스터 노드 구동파일(3000)과 스토리지 노드 구동파일(4000)은 제2 사용자 인터페이스(20b)의 설치버튼(50) 수단을 통하여 선택된 클라우드에 할당된 노드에 제2 구동파일(20b)인 마스터 노드 구동파일(3000)과 스토리지 노드 구동파일(4000)이 전송된다.(S24) In step S23, the second drive file 260b compressed by injecting the environment setting files is stored in the package storage 230, and the master node drive file 3000 and the storage node drive file as the second drive file 260b The 4000 is a master node drive file 3000 and a storage node drive file 4000 which are the second drive file 20b to the node allocated to the selected cloud through the means of the installation button 50 of the second user interface 20b. Is transmitted. (S24)

상기 패키지 저장부(230)에 저장된 제2 구동파일(260b)에 대한 실행파일을 통해 제2 구동파일(260b)인 마스터 노드 구동파일(3000)과 스토리지 노드 구동파일(4000)이 압축 해제(S25)한 후 제1 분산 처리 시스템(260c)이 생성된다.(S26) The master node driving file 3000 and the storage node driving file 4000, which are the second driving file 260b, are decompressed through the execution file for the second driving file 260b stored in the package storage 230 (S25) ), and then the first distributed processing system 260c is generated. (S26)

제1 분산 처리 시스템인 마스터 시스템(300)과 스토리지 시스템(4000)은 서비스 등록 및 정상 작동을 사용자가 확인하기 위해서 제 3의 사용자 인터페이스를 통해 메세지를 제공한다. 이때 정상 작동 및 확인이 완료되면 스토리지 노드(400)에 포함된 마스터 노드(300)에 대한 네트워크 정보 등을 통해 마스터 노드(300)와 스토리지 노드(400)는 연결된다.(S27)(S500)The first distributed processing system, the master system 300 and the storage system 4000, provide a message through a third user interface to allow the user to confirm service registration and normal operation. At this time, when the normal operation and verification are completed, the master node 300 and the storage node 400 are connected through network information on the master node 300 included in the storage node 400. (S27) (S500)

도 11 내지 도 12는 본 발명의 일 실시예에 있어서, 다양한 클라우드에 적용 가능한 분산 처리 시스템의 그 분산 처리 시스템과 동작흐름을 도시한 도면이다.11 to 12 are views illustrating the distributed processing system and the operation flow of the distributed processing system applicable to various clouds in one embodiment of the present invention.

도 11에서 도시된 바와 같이, 상기 분산 처리 시스템은 사용자에 의해 선택된 클라우드에 마스터 노드(300)와 스토리지 노드(400)와 계산 노드(500)가 포함된다.As shown in FIG. 11, the distributed processing system includes a master node 300, a storage node 400, and a compute node 500 in a cloud selected by a user.

도 12는 다양한 클라우드에 적용 가능한 분산 처리 시스템의 전체동작흐름을나타내며, 도 3의 단계 S600 내지 단계 S800에 포함된다.FIG. 12 shows the overall operation flow of the distributed processing system applicable to various clouds, and is included in steps S600 to S800 of FIG. 3.

단계 S500에서 마스터 시스템(300)과 스토리지 시스템(400)이 정상적으로 동작하고 있고, 사용자로부터 마스터 시스템(300)에 데이터와 테스크가 입력될 때(S30), 마스터 시스템(300)이 입력된 데이터와 테스크를 적절히 처리할수 있도록 계산노드(501)의 리소스와 노드의 수를 계산한다.(S31)(S600)In step S500, the master system 300 and the storage system 400 are operating normally, and when data and tasks are input from the user to the master system 300 (S30), the master system 300 inputs data and tasks. Calculate the number of nodes and resources of the calculation node 501 so that it can process properly. (S31) (S600)

상기 계산된 계산 노드(501) 수와 리소스를 통해 계산 노드(501)를 할당한 후(S32) 마스터 시스템(300)으로부터 계산 노드 구동파일(5000)을 전송한다.(S33)이때 계산 노드 구동파일(5000)이 할당되어야 할 계산 노드(501)에 오류가 일어날 경우, 마스터 시스템(300)으로 부터 지속적인 요청이 이루어진다. 지속적인 요청에도 오류가 생길 경우 또 다른 계산 노드(501n)를 마스터 시스템(300)으로부터 할당하여 계산 노드 구동파일(5000)을 전송한다. 전송이 잘 이루어지면 계산 노드(501)에 계산 노드 구동파일(5000)을 전송하고 등록 및 정상동작을 확인한다.(S34)(S700) 이때 설치되는 계산 노드 구동파일(5000)에는 마스터 시스템(300)의 네트워크 정보를 포함한다.After allocating the calculation node 501 through the calculated number of calculation nodes 501 and resources (S32), the calculation node driving file 5000 is transmitted from the master system 300. (S33) At this time, the calculation node driving file When an error occurs in the compute node 501 to which 5000 is to be allocated, a continuous request is made from the master system 300. If an error occurs even in the continuous request, another compute node 501n is allocated from the master system 300 to transmit the compute node drive file 5000. If the transfer is successful, the compute node drive file 5000 is transmitted to the compute node 501, and registration and normal operation are confirmed. (S34) (S700) The master system 300 is installed in the compute node drive file 5000 installed at this time. ) Network information.

상기 단계를 통하여 계산 시스템(500)이 생성(S35)되고 동작시 최초로 계산 시스템(500) 정보가 저장된 마스터 시스템(300)의 네트워크 정보를 통해서 마스터 시스템(300)에게 자신의 계산 시스템(500) 정보를 등록(S36)하는 단계를 포함할 수 있다.(S800)Through the above steps, the calculation system 500 is generated (S35), and the first calculation system 500 information stored in the calculation system 500 is stored through the network information of the master system 300 to the master system 300. It may include the step of registering (S36). (S800)

도 13 내지 도 16은 본 발명의 일실시예에 따른, 설치시스템을 통하여 클라우드에 설치되는 분산 처리 시스템이 다양한 클라우드(1000)에 적용되는 일 실시예를 도시한 도면이다. 도 13내지 도 16에서 표시된 M은 마스터 시스템(300), S는 스토리지 시스템(400), W는 계산 시스템(500)으로 정의한다.13 to 16 are views illustrating an embodiment in which a distributed processing system installed in a cloud through an installation system is applied to various clouds 1000 according to an embodiment of the present invention. 13 to 16, M is a master system 300, S is a storage system 400, and W is defined as a calculation system 500.

도 13은 본 발명에 따른 다양한 클라우드에 적용 가능한 분산 처리 시스템의 퍼블릭 클라우드(1000a)를 적용한 일 실시예이며, 이에 따라 본 발명의 설치시스템으로부터 설치된 마스터 시스템(300)이 데이터 및 테스크가 입력되는 시점에서 마스터 시스템(300)이 데이터 및 테스크를 분석하기 위한 계산 시스템(500)을 설치한다. 또한 추가적으로 필요한 노드의 수 만큼 리소스를 스케일 인아웃(SCLALE IN/OUT) 할 수 있다.13 is an embodiment in which the public cloud 1000a of a distributed processing system applicable to various clouds according to the present invention is applied, and accordingly, when the master system 300 installed from the installation system of the present invention inputs data and tasks In the master system 300 installs a calculation system 500 for analyzing data and tasks. In addition, resources can be scaled in and out (SCLALE IN/OUT) as many nodes as necessary.

도 14는 본 발명에 따른 다양한 클라우드에 적용 가능한 분산 처리 시스템의 프라이빗 클라우드(1000b)를 적용한 일 실시예이며, 통상적으로 프라이빗 클라우드(1000b)는 컴퓨팅 리소스가 한정되어 있다. 프라이빗 클라우드(1000b)에 설치될 때 처음에 제공된 컴퓨팅 리소스에 맞게 마스터 시스템(300), 스토리지 시스템(400), 계산 시스템(500)를 포함한 그 분산 처리 시스템을 프라이빗 클라우드(1000b)에 설치할 수 있다. 14 is an embodiment in which a private cloud 1000b of a distributed processing system applicable to various clouds according to the present invention is applied, and typically, the private cloud 1000b has limited computing resources. When installed in the private cloud 1000b, the distributed processing system including the master system 300, the storage system 400, and the computing system 500 may be installed in the private cloud 1000b in accordance with the computing resources initially provided.

도 15 내지 도 16는 본 발명에 따른 다양한 클라우드에 적용 가능한 분산처리시스템의 멀티클라우드 및 하이브리드 클라우드에 대한 일 실시예를 도시한 도면이다.15 to 16 are views illustrating an embodiment of a multi-cloud and a hybrid cloud of a distributed processing system applicable to various clouds according to the present invention.

통상적으로 하이브리드 클라우드는 둘 이상의 클라우드 조합을 갖는 클라우드를 말한다. Hybrid cloud typically refers to a cloud having a combination of two or more clouds.

도 15는 본 발명의 설치시스템을 통하여 설치되는 분산 처리 시스템이 둘 이상의 다양한 클라우드(1000a,1000b)에 각각 마스터 시스템(300), 스토리지 시스템(400)이 설치되고 데이터 및 테스크가 입력될 시 계산 시스템(500)이 설치되는 하이브리드 클라우드 환경을 이용하는 일 실시예를 보여준다. 15 is a calculation system when a distributed processing system installed through the installation system of the present invention is installed in the two or more various clouds (1000a, 1000b), respectively, the master system 300, the storage system 400 is installed and data and tasks are input. An example of using a hybrid cloud environment in which 500 is installed is shown.

도 16은 본 발명의 설치시스템을 통하여 설치되는 분산 처리 시스템이 제어부의 사용자 인터페이스를 통해 적어도 하나이상의 클라우드(1000a)에 최초로 마스터 시스템(300)과 스토리지 시스템(400)이 설치되고 제어부의 사용자 인터페이스를 통해 또 다른 클라우드(1000b)를 선택하여 노드의 수와 컴퓨팅 리소스를 마스터 싯스템(300)으로부터 확장하여 또 다른 클라우드(1000b)에 계산 시스템(500)이 설치된다. 설치된 계산 시스템(500)은 마스터 시스템(300)에 대한 네트워크 정보를 통해 마스터 시스템(300)과 연결되어 그 분산 처리 시스템이 구비된다. 마스터 시스템(300)과 스토리지 시스템(400)이 설치되는 클라우드(1000a)는 퍼블릭 클라우드 환경, 프라이빗 클라우드 환경 중 적어도 하나일 수 있다.16 shows that the distributed processing system installed through the installation system of the present invention is the first master system 300 and the storage system 400 installed in at least one cloud 1000a through the user interface of the control unit, and the user interface of the control unit is installed. Through another cloud 1000b, the number of nodes and computing resources are extended from the master system 300, and the computing system 500 is installed in another cloud 1000b. The installed calculation system 500 is connected to the master system 300 through network information for the master system 300, and a distributed processing system is provided. The cloud 1000a in which the master system 300 and the storage system 400 are installed may be at least one of a public cloud environment and a private cloud environment.

10 : 디스플레이 장치 260n : 구동파일
20 : 사용자 인터페이스 270 : 외부저장소
30 : 선택목록 300 : 마스터 시스템
40 : 노드설정 301 : 마스터 노드
50 : 설치버튼 400 : 스토리지 시스템
100 : 설치패키지 401 : 스토리지 노드
200 : 클라이언트 500 : 계산 시스템
210 : 패키지 구성 정보부 501 : 계산 노드
220 : 패키지 구동부 3000 : 마스터 노드 구동파일
230 : 패키지 저장부 4000 : 스토리지 노드 구동파일
240 : 제어부 5000 : 계산 노드 구동파일
250 : 컨테이너 10: display device 260n: drive file
20: user interface 270: external storage
30: selection list 300: master system
40: node setting 301: master node
50: installation button 400: storage system
100: installation package 401: storage node
200: client 500: calculation system
210: package configuration information unit 501: compute node
220: package driver 3000: master node drive file
230: package storage unit 4000: storage node drive file
240: control unit 5000: calculation node driving file
250: container

Claims

A package configuration information unit including a reference information file for building and compiling the source code and binaries of the package copied from the installation package to the client;
A package driver for creating and deleting containers to build and compile through the package configuration information unit; and
Containers that build and compile for various clouds and are created and deleted by the package driver; and
A driving file generated by building and compiling the source code and binary of the package in the container; and
A control unit for selecting and requesting a cloud for installing the drive file; and
A package storage unit that stores source code and binaries for the package and a driver file that has been built and compiled; and
An installation system of a distributed processing system for processing large-capacity data applicable to various clouds, including; an external storage that can receive source code and binaries from outside, in addition to the package storage unit.

According to claim 1,

The installation package, a large data processing applicable to various clouds characterized by a specific program or service capable of copying, building, compiling, etc. of a specific software source in the form of an application or web service of a desktop PC or server station Installation system of distributed processing system.

According to claim 2,

The installation package is for a large amount of data processing applicable to various clouds, which is characterized by installing a single operation program or system, installing one or more programs or systems, connecting them to each other, and installing a distributed processing system capable of performing a specific function. Installation system of distributed processing system.

According to claim 1,

The information in the reference information file includes a prerequisite package name, location information for source codes and binaries for the package, location information for executable files, version information, and container information file location information. Installation system of distributed processing system for data processing.

According to claim 1,

The package storage unit is a system for installing a distributed processing system for processing large-capacity data applicable to various clouds having a characteristic of receiving and storing source codes and binaries from a plurality of executable files and external storage.

According to claim 1,

The control unit provides a user interface for selecting various clouds and an installation system of a distributed processing system for processing large amounts of data applicable to various clouds having characteristics that include environment information for the various clouds.

According to claim 1,

The drive file is a result of an installation system for installing a distributed processing system in various clouds, and includes a master node drive file, a storage node drive file, and a compute node drive file for processing large amounts of data applicable to various clouds. Installation system of distributed processing system.

According to claim 1,

The external storage is an installation system of a distributed processing system for processing large amounts of data applicable to various clouds, characterized in that it exists in various media such as websites and programs.

According to claim 1,

The drive file is stored in various storage media and can exist as a compressed file. It is transmitted to a distributed node to be installed in various clouds and processes large data applicable to various clouds that do not contain only environment setting information required when connected after installation. Installation system of distributed processing system.

A method for installing a distributed processing system for processing large amounts of data applicable to various clouds, the method comprising:

Copying the installation system from the installation package to the client;

Generating a first drive file including a master node drive file, a storage node drive file, and a compute node drive file through the installation system;

Selecting a cloud by a user and assigning a master node and a storage node in the selected cloud computing environment;

Generating a second drive file by injecting an environment setting file into the first drive file to install a first drive file in the selected cloud;

Transmitting and decompressing the second drive file, the master node drive file and the storage node drive file, at each node to confirm a first distributed processing system registration and normal operation;

Through the step of transmitting and registering the second drive file, the master system and the storage system are operating normally, and when data and tasks are input from the user to the master system, the resources of the compute node and the node to act as the compute node Assigning a number;

Determining the compute node drive file from the master system after allocating the compute node, registering the compute node drive file to the compute node, verifying normal operation, and including the network information of the master system in the compute node drive file installed;

A method of installing a distributed processing system for processing large-capacity data applicable to various clouds, including the step of determining and registering the calculation node driving file and configuring a second distributed processing system.

The method of claim 10,

The generating of the first driving file may include reading a reference information file for a package constituting the driving file through the package driver through a user request;

Checking whether there is an essential package that must be built and compiled first in order to have the driving file;

Creating a container through a package driver to build and compile a required package in advance, and then reading a reference information file for the package to connect an executable file and a container in the package storage unit to connect the container and the package storage unit;

Reading the location information of the source code and the binary for the package from the reference information file and transmitting the source code and the binary to a container,

Building and compiling the source code and the binary for the package in a container through the step of transmitting the source code and the binary to a container,

When the compile and build is completed, the completed package is stored in the package storage unit, and the container is deleted.

If there are source codes and packages that have not been compiled and built, creating a container again to execute the build and compile as recursive or loop statements,

Generating a first driving file through the step of building and compiling the source code and the binaries; A method of installing a distributed processing system for processing large amounts of data applicable to various clouds, including.

The method of claim 11,

The step of checking the presence or absence of the required package is performed by checking necessary essential packages in advance to generate the drive file through a reference information file for a package constituting a drive file included in the package configuration information part through the package driver, and A method of installing a distributed processing system for processing large amounts of data applicable to various clouds having a characteristic of compiling.

The method of claim 12,

In the process of reading the reference information file for the package constituting the driving file, if there is a required package for the package, it is applicable for processing large amounts of data applicable to various clouds having characteristics that are dependent on loop or recursive syntax. How to install a distributed processing system.

The method of claim 11,

In the generating of the container, a distributed processing system for processing large-capacity data applicable to various clouds having characteristics that are appropriately generated for various clouds by creating containers from the package driving unit through environmental information on various clouds included in the control unit How to install.

The method of claim 11,

In the step of transmitting the source code and binary to the container, when there is no source code and binary in the package storage inside when referring to the internal location information of the source code and binary through the reference information file, the source through an external storage A method of installing a distributed processing system for processing large amounts of data applicable to various clouds having a feature of storing codes and binaries in the package storage unit.

The method of claim 10,

The cloud selection and request step is distributed processing for processing of large-capacity data applicable to various clouds including a selection list means for selecting a cloud to be used among a plurality of clouds by receiving a first user interface from a control unit How to install the system.

The method of claim 16,

The selection list is a radio button, a file object selection, a method for installing a distributed processing system for processing large amounts of data applicable to various clouds, characterized in that it has a first user interface type having at least one or more selection means among text.

The method of claim 10,

In the step of allocating the master node and the storage node to the selected cloud, the second user interface for allocating the node after the cloud selection and request step through the control unit may be displayed as a plurality of user interfaces, and allocating the node to the selected cloud A method of installing a distributed processing system for processing large amounts of data applicable to various clouds, including a node setting and completion button for entering and proceeding with the environment for the node to be allocated.

The method of claim 10,

The step of generating a second drive file by injecting an environment setting file to the first drive file is generated as a plurality of first drive files suitable for various cloud environments, and then environmental information for the cloud selected through a user interface to the control unit To create a configuration file in a specific folder and store it in the package storage section through the package driver, decompressing the first drive file and injecting the generated configuration file to apply to various clouds generating a second drive file Installation method of distributed processing system for processing large data as much as possible.

The method of claim 10,

In the step of registering the first distributed processing system as a service, the generated second driving file is transmitted to a node assigned to a selected cloud through a specific file including an execution command stored in a package storage unit, and decompressed to register a service. And a method of installing a distributed processing system for processing large amounts of data applicable to various clouds to check normal operation.

The method of claim 20,

The step of confirming the normal operation of the first distributed processing system service is for processing a large amount of data applicable to various clouds that provide a third user interface through a control unit so that a user directly checks a message to confirm normal operation and service registration. How to install a distributed processing system.

The method of claim 10,

In the step of registering the second distributed processing system, a large-capacity data applicable to various clouds that allocates a compute node from a master system and decompresses the compute node-driven file by transmitting it to the assigned compute node to check service registration and normal operation How to install a distributed processing system for processing.

The method of claim 22,

When transmitting the compute node driving file to the compute node allocated from the master system, if a transmission error occurs, a continuous request is made from the master system or a new compute node is created and transmitted from the master system, which is applicable to various clouds. A method of installing a distributed processing system for processing large amounts of data.

The method of claim 22,

The transmitted calculation node driving file includes network information of the master system, and when installed in the calculation node and provided with the calculation system, has a feature of transmitting its calculation system information to the master system through the network information of the master system. A method of installing a distributed processing system for processing large amounts of data applicable to various clouds.

One or more master systems to be initially installed in the cloud through the installation system; and one or more calculation systems that process all or part of the data through the master system when data to be analyzed is input; and one or more storage to store processed data Large data distribution processing system applicable to various clouds, characterized by including;

The method of claim 20,

The master system includes a calculation node driving file, and when data and tasks are input, a massive data distribution processing system applicable to various clouds having a feature of calculating and allocating the number of calculation nodes and resources of the calculation node from the master node .

The method of claim 20,

The storage system is a large-scale data distribution processing system applicable to various clouds, characterized in that it includes information required for data analysis and data processed by the compute node.

The method of claim 20,

The calculation system includes network information on the master node at the time of installation, and is applied to various clouds characterized by registering information on the calculation node on the master node through network information on the master node. Large data distribution processing system possible.

The method of claim 25

The distributed processing system, which is installed in various clouds through an installation system, has a characteristic applied to various clouds provided by public cloud, private cloud, hybrid cloud, or one or more commercial cloud service providers depending on the location of the physical system. Large data distribution processing system applicable to various clouds.