KR101878291B1

KR101878291B1 - Big data management system and management method thereof

Info

Publication number: KR101878291B1
Application number: KR1020170063987A
Authority: KR
Inventors: 송민구; 최중인
Original assignee: 재단법인차세대융합기술연구원
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2018-08-07
Also published as: WO2018216828A1

Abstract

The present invention relates to a method for managing energy big data by using a spark cluster server and a web client. The method comprises: a step of processing and managing energy big data collected in real time in a spark cluster server side; and an information transmitting step of the spark cluster server transmitting information to be displayed on a screen of the web client to the web client according to a request of the web client. The step of processing the energy big data includes the following steps of: classifying the energy big data including typical data and atypical data; analyzing the classified energy big data; and storing the analyzed energy big data in the spark cluster server. An operation state of the web client providing the energy big data is comprehensively managed and controlled in the spark cluster server, thereby efficiently managing the energy big data depending on a purpose.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention < RTI ID = 0.0 > [0002] <

본 개시(Disclosure)는 전체적으로 빅데이터(Big Data)를 관리하는 시스템 및 방법에 관한 것으로, 특히 스파크 스트리밍(Spark Streaming) 기반 클라우드 시스템에서 실시간으로 수집되는 에너지 빅데이터를 목적에 따라 저장 및 관리하는 에너지 빅데이터 관리 시스템 및 그 방법에 관한 것이다.Disclosure relates to a system and method for managing Big Data as a whole, and particularly relates to a method and system for managing large data that is stored in a real-time manner in a spark streaming-based cloud system, Big data management system and method thereof.

여기서는, 본 개시에 관한 배경기술이 제공되며, 이들이 반드시 공지기술을 의미하는 것은 아니다(This section provides background information related to the present disclosure which is not necessarily prior art).Herein, the background art relating to the present disclosure is provided, and these are not necessarily meant to be known arts.

분산형 클러스터 컴퓨팅 프레임워크(distributed cluster computing framework)는 현대 컴퓨팅 시대에 지속적으로 증가하는 빅데이터(Big Data)에 대처하기 위해 인기를 얻고 있다. 예를 들어, 하둡(Hadoop)과 스파크(Spark)는 빠르게 성장하고 있으며, 구글(Google), 페이스북(Facebook), 아마존(Amazon) 등과 같은 많은 인터넷 서비스 회사들은 이러한 클러스터 컴퓨팅 플랫폼을 기계학습(machine learning) 등 그들 고유의 기술을 탑재한 실시간 서비스를 제공을 위한 플랫폼으로서 활용하고 있다.Distributed cluster computing frameworks are gaining popularity to cope with the ever-increasing Big Data in the modern computing era. For example, Hadoop and Spark are growing rapidly and many Internet service companies, such as Google, Facebook, Amazon, etc., learning and other real-time services using their own technologies.

구체적으로, 스파크는 빅데이터를 위한 실시간 분산형 컴퓨팅 프레임워크로서, 빅데이터 처리를 분산 클러스터 환경에서 고속으로 실행할 수 있다.Specifically, spark is a real-time distributed computing framework for big data, and can execute big data processing at high speed in a distributed cluster environment.

대표적인 분산형 프레임워크인 하둡과는 다르게 스파크에는 대개 '실시간'이라는 용어가 따라붙는다.Unlike Hadoop, which is a typical decentralized framework, the term 'real-time' is often associated with sparks.

빅데이터 처리 측면에서 하둡은 HDFS(Hadoop Distributed File System)로 된 스토리지를 경유하기 때문에 상호 작용이 많아져 처리 속도가 느리다는 단점이 있다. 이에 비해, 스파크는 인-메모리 처리를 기본 방식으로 하기 때문에 좀 더 빠르고 지연 속도가 낮은 분석이 가능해 차세대 빅데이터 처리를 위한 프레임워크로 기대되고 있다.In terms of big data processing, Hadoop has a disadvantage of slow processing speed because it interacts with Hadoop Distributed File System (HDFS) storage. On the other hand, spark is expected to be a framework for next-generation big data processing because of its in-memory processing as a basic method, enabling faster and less-delayed analysis.

스파크는 처리 대상이 되는 빅데이터를 HDFS를 매개로 읽고 쓸 수 있지만, 이후 처리는 기본적으로 메모리에서 하기 때문에 기계학습이나 차트 계산처럼 반복 계산이 많은 작업에 대해서는 하둡보다 수행 속도가 빠를 수 있다. 그래서 스파크는 하둡 맵리듀스 상에서 구동하는 것보다 100배 이상 빠른 데이터 분석 작업을 수행할 수 있게 해 준다는 평가를 받고 있다.Sparks can read and write big data to be processed via HDFS, but later processing is basically done in memory, so it can be faster than Hadoop for many iterative calculations like machine learning or charting. Hence, Spark is evaluated to be able to perform data analysis tasks that are 100 times faster than running on Hadoop MapReduce.

맵리듀스는 배치 모드에서 작업을 실행하기 때문에 하둡 클러스터에서 성능 상의 병목 현상을 일으킨다는 지적을 받아왔다. 이에 비해 스파크는 5초 이하의 짧은 배치 작업을 통해 분석을 처리하기 때문에 맵리듀스의 대안으로 부각되었다.MapReduce has been pointed out to be a performance bottleneck in Hadoop clusters because it runs jobs in batch mode. Spark, on the other hand, has emerged as an alternative to MapReduce because it handles analysis through a short batch of less than 5 seconds.

여기서, 빅데이터란 그 양이 많고, 형태가 다양하며, 빠른 속도로 생성/갱신되고 있으나 정형(structured)화되지 않아 처리가 어렵고 정형, 비정형의 형태가 공존하는 현대의 거의 모든 종류의 디지털 데이터를 말하는 용어이다. 이때, 정형(structured) 데이터란 고정된 필드에 저장된 데이터로서, 관계형 데이터베이스 및 스프레드시트 등을 예로 들 수 있고, 비정형(unstructured) 데이터란 고정된 필드에 저장되어 있지 않은 데이터로서, 텍스트 분석이 가능한 텍스트 문서 및 전처리(preprocessing) 되지 않은 이미지/동영상/음성 데이터 등을 예로 들 수 있다. 그리고 반정형(semi-structured) 데이터란 고정된 필드에 저장되어 있지는 않지만, 메타 데이터나 스키마 등을 포함하는 데이터로서, XML이나 HTML 텍스트 등을 예로 들 수 있다. 빅데이터는 대용량 데이터라 하기도 한다.Here, the big data is a large amount of digital data of various types, which are generated / updated at a high speed, but are not structured so that it is difficult to process and there are both stereotyped and irregular forms. It is a saying. In this case, the structured data is data stored in a fixed field, such as a relational database and a spreadsheet. Unstructured data is data that is not stored in a fixed field, Documents and image / video / audio data that are not preprocessed. Semi-structured data is data that is not stored in a fixed field but includes metadata or schema, and may be XML or HTML text, for example. Big data is sometimes referred to as large data.

이에 대하여 '발명의 실시를 위한 구체적인 내용'의 후단에 기술한다.This will be described later in the Specification for Implementation of the Invention.

여기서는, 본 개시의 전체적인 요약(Summary)이 제공되며, 이것이 본 개시의 외연을 제한하는 것으로 이해되어서는 아니 된다(This section provides a general summary of the disclosure and is not a comprehensive disclosure of its full scope or all of its features).SUMMARY OF THE INVENTION Herein, a general summary of the present disclosure is provided, which should not be construed as limiting the scope of the present disclosure. of its features).

본 개시에 따른 일 태양에 의하면(According to one aspect of the present disclosure), 클러스터(spark cluster) 서버와 웹 클라이언트를 이용하여 에너지 빅데이터를 관리하는 방법에 있어서, 실시간으로 수집된 에너지 빅데이터를 클러스터 서버측에서 가공하여 관리하는 단계; 그리고 웹 클라이언트의 요청에 따라 웹 클라이언트의 화면에 디스플레이 될 정보를 클러스터 서버가 웹 클라이언트로 전송하는 정보 전송 단계;를 포함하며, 에너지 빅데이터 가공 단계에서, 정형 데이터 및 비정형 데이터를 포함하는 에너지 빅데이터를 분류하는 단계; 분류된 에너지 빅데이터를 분석하는 단계; 그리고 분석된 에너지 빅데이터를 클러스터 서버에 저장하는 단계를 포함하는 에너지 빅데이터 관리 방법이 제공된다.According to one aspect of the present disclosure, there is provided a method for managing energy big data using a spark cluster server and a web client, Processing and managing on the server side; And an information transmission step of transmitting the information to be displayed on the screen of the web client to the web client in response to the request of the web client. In the energy big data processing step, the energy big data including the fixed data and the atypical data ; Analyzing the classified energy big data; And storing the analyzed energy big data in a cluster server.

본 개시에 따른 다른 일 태양에 의하면(According to another aspect of the present disclosure), 클러스터 서버와 웹 클라이언트를 이용한 에너지 빅데이터 관리 시스템에 있어서, 클러스터 서버를 제어하는 매니지노드; 그리고 실시간으로 수집된 에너지 빅데이터를 가공 및 저장하는 데이터노드;를 포함하고, 매니지노드는 웹 클라이언트의 요청에 따라 데이터노드에 저장된 정보를 전송하거나 데이터노드에 새로운 에너지 빅데이터가 수신되도록 제어하는 에너지 빅데이터 관리 시스템이 제공된다.According to another aspect of the present disclosure, there is provided an energy big data management system using a cluster server and a web client, the system comprising: a management node controlling a cluster server; And a data node for processing and storing the energy big data collected in real time. The management node transmits energy stored in the data node according to the request of the web client, or controls energy A big data management system is provided.

도 1은 본 개시에 따른 에너지 빅데이터 관리 시스템의 전체 구성의 일 예를 나타내는 도면,
도 2는 본 개시에 따른 RDD 추상화 과정의 개념을 나타내는 도면,
도 3은 본 개시에 따른 클러스터 서버의 일 예를 나타내는 도면,
도 4는 본 개시에 따른 데이터노드의 일 예를 나타내는 도면.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing an example of the overall configuration of an energy big data management system according to the present disclosure;
2 is a diagram illustrating the concept of an RDD abstraction process according to the present disclosure;
3 is a diagram illustrating an example of a cluster server according to the present disclosure;
4 shows an example of a data node according to the present disclosure;

이하, 본 개시를 첨부된 도면을 참고로 하여 자세하게 설명한다(The present disclosure will now be described in detail with reference to the accompanying drawing(s)).The present disclosure will now be described in detail with reference to the accompanying drawings.

도 1는 본 개시에 따른 에너지 빅데이터 관리 시스템의 전체 구성의 일 예를 나타내는 도면이다.1 is a diagram showing an example of the overall configuration of an energy big data management system according to the present disclosure.

에너지 빅데이터 관리 시스템(1)은 도 1을 참조하면, 클러스터(cluster) 서버(10)와 웹 클라이언트(20)를 포함한다.The energy big data management system 1 includes a cluster server 10 and a web client 20 with reference to FIG.

클러스터 서버(10)는 실시간으로 수집되는 에너지 빅데이터를 가공하여 저장 및 관리한다. 본 개시에서 클러스터 서버(10)는 스파크 프레임워크를 기반으로 하는 클러스터 서버인 것이 바람직하며, 이에 한정되지 않는다.The cluster server 10 processes, stores, and manages energy big data collected in real time. In the present disclosure, the cluster server 10 is preferably a cluster server based on a spark framework, but is not limited thereto.

일반적으로, 대용량의 에너지 빅데이터를 분산된 서버에서 처리하기 위해서는 데이터 분석 프레임워크가 필수적이며, 이러한 데이터 분석 프레임워크는 안정성(safety), 데이터 보완성(data security), 적시성(timeliness), 신뢰성(reliability) 및 안티에이징(anti-aging) 등과 같은 다양한 요구 사항들을 충족할 필요가 있다.Generally, a data analysis framework is essential for processing a large amount of energy big data in a distributed server. Such a data analysis framework can be classified into safety, data security, timeliness, reliability reliability, and anti-aging.

언급된 서비스의 제공을 위해서는 스트림 프로세싱(stream processing)을 지원하는 데이터 분석 프레임워크가 필수적이다. 현존하는 데이터 분석 프레임워크들 중 인-메모리(in-memory) 처리를 통해 세컨드-스케일 프로세싱(second-scale processing)을 지원하는 스파크 스트리밍(Spark Streaming)이 각광을 받고 있다.To provide the mentioned services, a data analysis framework that supports stream processing is essential. Among the existing data analysis frameworks, Spark Streaming, which supports second-scale processing through in-memory processing, is in the spotlight.

스파크 스트리밍은 기존의 Apache Hadoop에서 빈번한 스토리지 접근에 의한 느린 잡(job)의 수행 시간을 개선하기 위해 2012년 UCBerkeley에서 제안된 스파크에 기반하여 동작한다. 스파크는 빈번한 스토리지 접근을 줄이기 위해 잡에서 반복적으로 사용될 중간 결과값을 메모리에 저장하여 그 횟수를 줄여 수행 시간을 단축하였다. 그 중간 결과값을 메모리에 저장하여 관리하기 위해 사용되는 데이터 구조는 RDD(resilient distributed dataset)이며 트랜스포메이션(transformation)과 액션(action)이라는 메소드들이 제공될 수 있다. 스파크는 배치 프로세싱(batch processing)이기 때문에 근본적으로는 스트림 프로세싱(stream processing)이 지원되지 않는다. 여기서, RDD는 병렬처리를 지원하며, 결함 감내 특성을 가짐으로써, 도 2에 도시된 바와 같이 연산과정을 통해 빅데이터를 활용하고 분석할 수 있다.Spark streaming works based on sparks proposed by UCBerkeley in 2012 to improve the run time of slow jobs due to frequent storage accesses in existing Apache Hadoop. In order to reduce frequent storage access, sparks shorten the execution time by reducing the number of intermediate results stored in the memory to be used repeatedly in the job. The data structure used to store and manage the intermediate result value in the memory is resilient distributed dataset (RDD), and methods such as transformation and action can be provided. Because sparks are batch processing, stream processing is not fundamentally supported. Here, RDD supports parallel processing and has defect tolerance characteristics, so that big data can be utilized and analyzed through an operation process as shown in FIG.

하지만 다양한 산업에서 스트림 프로세싱(stream processing)을 요구하여 스트림 프로세싱을 위해 개발된 것이 스파크 스트리밍이다. 스파크 스트리밍은 입력으로 전달되는 라이브 스트림 데이터를 주기적으로 마이크로 배치(micro batch) 형태로 스파크에 전달하여 처리할 수 있다. 이러한 스파크 스트리밍이 스토리지에 접근하는 경우는 두 가지 경우가 있다. 두 가지 경우는 장애 복구(fault recovery)를 위해 입력 데이터를 저장하는 경우와 RDD에서 쫓겨난 데이터를 다시 읽는 경우이다.However, spark streaming has been developed for stream processing by requiring stream processing in various industries. Spark streaming can process live stream data, which is delivered as input, to the spark in a micro batch format. There are two cases when such spark streaming approaches storage. The two cases are when the input data is saved for fault recovery and when the RDD is rereading the data that has been kicked out.

클러스터 서버(10)는 도 2를 참조하면, 매니지노드(100), 데이터노드(120) 및 엣지노드(140)를 포함한다.The cluster server 10 includes a management node 100, a data node 120, and an edge node 140 with reference to FIG.

매니지노드(100)는 웹 클라이언트(20)의 요청에 따라 데이터노드(120)에 저장된 정보가 웹 클라이언트(20)로 전송되거나 또는 새로운 에너지 빅데이터가 데이터노드(120)로 실시간 전송될 수 있도록 클러스터 서버(10)를 제어한다.The management node 100 may transmit the information stored in the data node 120 to the web client 20 or transmit the new energy big data to the data node 120 in real time according to the request of the web client 20, And controls the server 10.

구체적으로, 매니지노드(100)는 외부 환경(22)으로부터 발생하는 외부에너지데이터가 데이터노드(120)에 수신되도록 데이터노드(120)에 제어신호 즉, 외부에너지데이터입력신호를 생성하여 전달하고, 현장 환경(24)으로부터 발생하는 현장에너지데이터가 데이터노드(120)에 수신되도록 데이터노드(120)에 제어신호 즉, 현장에너지데이터입력신호를 생성하여 전달한다. 현장 환경(24)은 예를 들어 업무기관, 공용기관, 주택 등을 포함할 수 있다.Specifically, the management node 100 generates and transmits a control signal, i.e., an external energy data input signal, to the data node 120 so that external energy data generated from the external environment 22 is received by the data node 120, A field energy data input signal is generated and transmitted to the data node 120 so that the field energy data generated from the field environment 24 is received by the data node 120. [ The field environment 24 may include, for example, a business entity, a public agency, a home, and the like.

또한, 매니지노드(100)는 웹 클라이언트(20)의 화면(21)에 정보가 디스플레이될 수 있도록 데이터노드(120)에 저장된 해당 에너지 빅데이터를 검색하여 정보가 웹 클라이언트(20)로 송신되도록 데이터노드(120)에 제어신호 즉, 검색요청신호를 생성하여 전달한다.The management node 100 searches for corresponding energy big data stored in the data node 120 so that information can be displayed on the screen 21 of the web client 20, And generates and transmits a control signal, that is, a search request signal, to the node 120.

그리고, 매니지노드(100)는 클러스터 서버(10) 및 웹 클라이언트(20)의 고장에 대한 제어신호 즉, 장애 복구(fault recovery)신호를 생성하여 데이터노드(120) 또는 웹 클라이언트(20)로 전달한다.The management node 100 generates a control signal for a failure of the cluster server 10 and the web client 20, that is, a fault recovery signal, and transmits the fault signal to the data node 120 or the web client 20 do.

클러스터 서버(10)가 웹 클라이언트(20)에 대한 고장 진단 및 이상 예측 상황을 확인할 수 있다. 이와 같이, 클러스터 서버(10)가 누적된 에너지 빅데이터를 기초로 하여 웹 클라이언트(20)의 동작 상태를 제어함으로써, 웹 클라이언트(20)의 동작을 원활하게 함으로써, 에너지절감 시장에 능동적이고 자발적으로 동참할 수 있는 효과가 있다.The cluster server 10 can check the failure diagnosis and the anomaly prediction status of the web client 20. As described above, the cluster server 10 controls the operation state of the web client 20 on the basis of the accumulated energy big data, so that the operation of the web client 20 is smoothly performed, thereby enabling the cluster server 10 to actively and voluntarily There is an effect that can join.

본 개시에서, 도 3을 참조하여 살펴보면, 매니지노드(100)는 2개로 이루어지는 것으로 도시하였지만, 이에 한정되지 않는다.In the present disclosure, referring to FIG. 3, although the management node 100 is shown as two, it is not limited thereto.

매니지노드(100)는 2개로 이루어지는 경우, 에너지 빅데이터가 대용량으로 송수신되는 경우, 클러스터 서버(10)의 환경 또는 웹 클라이언트(20)의 환경에 따라 제1 매니지노드(110)가 메인 매니지노드로 사용되고, 나머지 제2 매니지노드(120)는 보조 매니지노드로 사용될 수 있다. 이와 달리 2개의 매니지노드 중 하나의 매니지노드만 사용되거나 2개의 매니지노드 모두가 사용될 수 있다.When the management node 100 is two or more and the energy big data is transmitted / received in a large capacity, the first management node 110 is connected to the main management node 100 according to the environment of the cluster server 10 or the environment of the web client 20 And the remaining second management node 120 may be used as an auxiliary management node. Alternatively, only one of the two management nodes may be used, or both of the two management nodes may be used.

2개의 매니지노드(100)가 모두 동작되는 경우 클러스터 서버(10)의 과부하가 방지되어 에너지 빅데이터 관리 시스템이 안정적으로 동작하여 신뢰성이 증가될 수 있다.When both of the two management nodes 100 are operated, the cluster server 10 is prevented from being overloaded, so that the energy big data management system operates stably and reliability can be increased.

데이터노드(120)는 매니지노드(100)로부터 제어신호를 수신하여 웹 클라이언트(20)로부터 수신한 에너지 빅데이터를 관리한다. 본 개시에서, 도 3을 참조하여 살펴보면, 데이터노드(120)는 6개로 이루어지는 것으로 도시하였지만, 이에 한정되지 않는다.The data node 120 receives the control signal from the management node 100 and manages the energy big data received from the web client 20. In the present disclosure, referring to FIG. 3, although the data node 120 is shown as being composed of six, it is not limited thereto.

에너지 빅데이터는 직접적 또는 간접적으로 외부 환경으로부터 발생하는 웹 클라이언트(20)로부터 발생하는 외부에너지데이터 및 현장에너지데이터를 포함할 수 있다. 여기서, 외부에너지데이터는 외부 환경(22)에 따른 지형 정보, 날씨 정보, 소셜 정보 등을 포함하고, 현장에너지데이터는 현장 환경(24)에 따른 에너지 사용량, 에너지 잔량 등을 포함할 수 있다.The energy big data may include external energy data and on-site energy data originating from the web client 20 originating directly or indirectly from the external environment. Here, the external energy data may include topographic information, weather information, social information, etc. according to the external environment 22, and the site energy data may include energy usage, energy amount, etc. according to the field environment 24.

이와 같은, 외부에너지데이터 및 현장에너지데이터로 이루어진 에너지 빅데이터는 정형 데이터 및 비정형 데이터로 분류될 수 있다. 이와 달리, 에너지 빅데이터는 반정형 데이터로도 분류될 수 있지만, 본 개시에서는 정형 데이터 및 비정형 데이터로 분류하여 설명한다.The energy big data made up of the external energy data and the field energy data can be classified into the fixed data and the unstructured data. Alternatively, energy big data can also be classified as semi-structured data, but this disclosure classifies them into both structured data and unstructured data.

구체적으로, 도 4를 참조하면 데이터노드(120)는 데이터 수신부(1210), 데이터 분석부(1220), 데이터 저장부(1230) 및 데이터 관리부(1240)를 포함한다.4, the data node 120 includes a data receiving unit 1210, a data analyzing unit 1220, a data storing unit 1230, and a data managing unit 1240.

데이터 수신부(1210)는 실시간으로 전송되는 외부에너지데이터와 매니지노드(100)의 제어신호에 대응하여 수신되는 현장에너지데이터를 수신한다. 에너지 빅데이터에 포함된 정형 데이터는 카프카(Kafka)를 통해 수집되고, 에너지 빅데이터에 포함된 비정형 데이터는 플럼(Flume)을 통해서 수집될 수 있다.The data receiving unit 1210 receives the external energy data transmitted in real time and the field energy data received in response to the control signal of the management node 100. [ Formal data included in energy big data is collected through Kafka, and unstructured data included in energy big data can be collected through a flume.

데이터 분석부(1220)는 수집된 에너지 빅데이터를 분석 가능하도록 변환하여 분야별로 분석할 수 있다.The data analyzing unit 1220 can convert the collected energy big data into analyzable data and analyze the collected data according to fields.

구체적으로, 카프카(Kafka)를 통해서 분류된 정형 데이터와 플럼(Flume)을 통해서 분류된 비정형 데이터를 분석 가능하도록 기계학습(MLlib) 및 스쿱(Sqoop)을 이용하여 프로그래밍 언어로 변환한 후 분석할 수 있다.Specifically, it can be analyzed by converting into programming language using MLlib and Sqoop to analyze unstructured data classified through Kafka and regular data classified through Flume have.

스쿱(Sqoop)은 관계형 데이터베이스 체계(RDBMS)의 정형 데이터를 HDFS 및 HBase로 변환하고, 기계학습(MLlib)은 지도학습 및 비지도학습의 일부 알고리즘에 국한되지만 기계학습 구현을 파이썬, 스칼라, 자바 등 여러 종류의 프로그래밍 언어로 지원하고, 그래프 X(GraphX)는 차트 계산을 위한 라이브러리이다. 여기서, 스쿱(Sqoop)은 실시간 데이터 전송을 위해 카프카(Kafka) 및 플럼(Flume)과 연계되는 것이 바람직하다.Sqoop transforms structured data from relational database systems into HDFS and HBase, while machine learning (MLlib) is limited to some of the algorithms in instructional and non-instructional learning. However, machine learning implementations can be implemented in Python, Scala, Java, etc. Supported by various programming languages, Graph X (GraphX) is a library for chart computation. Here, the Sqoop is preferably associated with Kafka and Flume for real-time data transmission.

데이터 저장부(1230)는 분석된 데이터를 활용 분야별로 분류하여 저장한다.The data storage unit 1230 stores the analyzed data classified according to application fields.

데이터 관리부(1240)는 웹 클라이언트(20)의 요청에 따라 데이터 저장부(1230)에 저장된 정보를 제공하는 제1 관리부(1242)와 데이터 저장부(1230)에 저장된 정보를 활용할 수 있도록 시각화하는 제2 관리부(1244)와 클러스터 서버(10) 및 웹 클라이언트(20)를 관리하는 제3 관리부(1246)을 포함한다.The data management unit 1240 A first management unit 1242 for providing information stored in the data storage unit 1230 in response to a request from the web client 20 and a second management unit 1244 for visualizing information stored in the data storage unit 1230 so as to utilize the information, And a third management unit 1246 that manages the cluster server 10 and the web client 20.

제1 관리부(1242)는 웹 클라이언트(20)의 요청에 따라 데이터노드(120)에 저장된 정보가 전송될 수 있도록 대응 정보를 검색하는 하둡 데이터베이스(HBase)부의 정형 데이터에 대한 SQL 검색을 지원하는 스파크SQL과 데이터 과학에 유용한 통계 도구인 R을 연결지어주는 스파크R(Spark R)을 포함할 수 있다.The first management unit 1242 includes a first management unit 1242 and a second management unit 1242. The first management unit 1242 includes a first management unit 1242, It can include Spark R, which links SQL and R, a useful statistical tool for data science.

제2 관리부(1244)는 복수의 작업을 순차적으로 배치하며 워크 플로우 스케줄링 및 모니터링을 수행하는 우지(Oozie) 등을 포함할 수 있다.The second management unit 1244 may include Oozie or the like that sequentially arranges a plurality of jobs and performs workflow scheduling and monitoring.

제3 관리부(1246)는 현장에서 발생하는 다종의 장애와 예외가 해결되도록 보조해주는 역할을 수행하는 주키퍼(Zookeeper)를 포함할 수 있다.The third management unit 1246 may include a main keeper (Zookeeper) that plays a role of assisting in resolving various obstacles and exceptions occurring in the field.

이와 같은 데이터노드(120)는 매니지노드(100)로부터 검색요청신호가 입력되는 경우, 웹 클라이언트(20)의 요청에 따라 웹 클라이언트(20)의 화면(21)에 정보가 디스플레이 되도록 정보를 전송한다. 이때, 웹 클라이언트(20)의 요청에 따른 우선 순위를 고려하여 분야별 배치된 에너지 빅데이터를 검색하여 정보를 전송한다.When the search request signal is input from the management node 100, the data node 120 transmits information to be displayed on the screen 21 of the web client 20 according to a request of the web client 20 . At this time, considering the priority according to the request of the web client 20, the energy big data arranged according to the field is searched and information is transmitted.

그리고, 데이터노드(120)는 매니지노드(100)로부터 외부에너지데이터입력신호 및 현장에너지데이터입력신호가 입력되는 경우, 실시간으로 웹 클라이언트(20)로부터 에너지 데이터를 수신한다. 여기서, 외부 에너지 데이터는 별도의 외부에너지데이터입력신호 없이 실시간으로 데이터노드(120)에 입력될 수 있다.The data node 120 receives energy data from the web client 20 in real time when an external energy data input signal and a field energy data input signal are input from the management node 100. [ Here, the external energy data may be input to the data node 120 in real time without a separate external energy data input signal.

엣지노드(140)는 클러스터(cluster) 서버(10)와 웹 클라이언트(20)가 직접적으로 또는 간접적 예를 들어 네트워크를 통해 상호작용할 수 있도록 연결시키는 역할을 수행한다. 네트워크는 예를 들어, Local Area Network (LAN), Wide Area Network (WAN), Virtual Private Network (VPN) 또는 인터넷과 같은 임의 행태의 네트워크일 수 있다. 본 개시에서, 도 3을 참조하여 살펴보면, 엣지노드(140)는 2개로 이루어지는 것으로 도시하였지만, 이에 한정되지 않는다.The edge node 140 serves to connect the cluster server 10 and the web client 20 so that they can interact directly or indirectly, for example, via a network. The network may be, for example, a network of any behavior, such as a Local Area Network (LAN), a Wide Area Network (WAN), a Virtual Private Network (VPN) In the present disclosure, referring to FIG. 3, although the edge node 140 is shown as two, it is not limited thereto.

웹 클라이언트(20)는 일반적으로 네트워크 통신이 가능한 컴퓨터(20; 예: PC)로 이루어지는 것이 바람직하며, 복수개로 이루어질 수 있다. 웹 클라이언트(20)가 통상의 컴퓨터일 수 있으나, 예를 들어 데스크톱 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, PDA (personal digital assistant), 또는 스마트폰을 포함한 임의 형태의 기계 또는 컴퓨팅 장치일 수 있음을 이해할 것이다.The web client 20 is preferably a computer 20 (e.g., a PC) capable of network communication, and may be a plurality of web clients. It will be appreciated that the web client 20 may be a conventional computer, but may be any type of machine or computing device, including, for example, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant (PDA) .

복수개로 이루어지는 웹 클라이언트(20)는 각각 동일한 형태이거나 다른 형태일 수 있고, 네트워크를 통한 통신이 가능하도록 데이터를 송신/수신하는 송신기/수신기(미도시)를 포함할 수 있다.The plurality of web clients 20 may each be the same type or different type, and may include a transmitter / receiver (not shown) for transmitting / receiving data to enable communication via a network.

이하 본 개시의 다양한 실시 형태에 대하여 설명한다.Various embodiments of the present disclosure will be described below.

(1) 클러스터(spark cluster) 서버와 웹 클라이언트를 이용하여 에너지 빅데이터를 관리하는 방법에 있어서, 실시간으로 수집된 에너지 빅데이터를 클러스터 서버측에서 가공하여 관리하는 단계; 그리고 웹 클라이언트의 요청에 따라 웹 클라이언트의 화면에 디스플레이 될 정보를 클러스터 서버가 웹 클라이언트로 전송하는 정보 전송 단계;를 포함하며, 에너지 빅데이터 가공 단계에서, 정형 데이터 및 비정형 데이터를 포함하는 에너지 빅데이터를 분류하는 단계; 분류된 에너지 빅데이터를 분석하는 단계; 그리고 분석된 에너지 빅데이터를 클러스터 서버에 저장하는 단계를 포함하는 에너지 빅데이터 관리 방법.(1) A method for managing energy big data using a spark cluster server and a web client, comprising: processing and managing energy big data collected in real time on a cluster server side; And an information transmission step of transmitting the information to be displayed on the screen of the web client to the web client in response to the request of the web client. In the energy big data processing step, the energy big data including the fixed data and the atypical data ; Analyzing the classified energy big data; And storing the analyzed energy big data in a cluster server.

웹 클라이언트의 대표적인 예는 PC이지만, 이에 제한되지않고, 화면을 통해 클러스터 서버로부터 전송 받은 정보를 디스플레이할 수 있는 컴퓨팅 수단(예: 휴대폰)이라면 어떠한 것도 좋다. 이러한 일련의 단계는 서버 측 컴퓨터의 내적 과정으로서, 소프트웨어에 의해 행해진다.A representative example of the Web client is a PC, but not limited thereto, and any computing means (for example, a cellular phone) capable of displaying information transmitted from a cluster server through a screen is preferable. This series of steps is an intrinsic process of the server-side computer, which is done by software.

(2) 실시간으로 수집된 에너지 빅데이터는 외부 에너지 데이터 및 현장 에너지 데이터를 포함하는 에너지 빅데이터 관리 방법.(2) The energy big data collected in real time includes external energy data and on-site energy data.

(3) 분석된 에너지 빅데이터를 분할하여 분야별로 배치하는 단계;를 더 포함하는 에너지 빅데이터 관리 방법.(3) dividing the analyzed energy big data and arranging it by sector.

(4) 에너지 빅데이터를 분석하기 전에, 실시간 수신된 에너지 빅데이터를 분석 가능한 데이터로 변환하는 단계;를 더 포함하는 에너지 빅데이터 관리 방법.(4) converting the energy big data, which is received in real time, into analyzable data before analyzing the energy big data.

(5) 정보 전송 단계는, 웹 클라이언트의 요청이 현장 제어에 필요한 에너지 빅데이터에 대한 정보를 포함할 때, 이 요청에 대응하는 분야별 배치된 에너지 빅데이터를 분석 또는 검색하는 과정을 포함하는 에너지 빅데이터 관리 방법.(5) The information transmission step includes the steps of analyzing or retrieving energy-rich data arranged in the field corresponding to the request, when the request of the web client includes information on energy big data required for on-site control, How to manage your data.

(6) 웹 클라이언트의 요청에 따른 우선 순위를 고려하여 분류된 에너지 빅데이터를 분석 또는 검색하는 에너지 빅데이터 관리 방법.(6) An energy big data management method for analyzing or retrieving energy big data classified according to priorities of web clients.

(7) 클러스터 서버와 웹 클라이언트를 이용한 에너지 빅데이터 관리 시스템에 있어서, 클러스터 서버를 제어하는 매니지노드; 그리고 실시간으로 수집된 에너지 빅데이터를 가공 및 저장하는 데이터노드;를 포함하고, 매니지노드는 웹 클라이언트의 요청에 따라 데이터노드에 저장된 정보를 전송하거나 데이터노드에 새로운 에너지 빅데이터가 수신되도록 제어하는 에너지 빅데이터 관리 시스템.(7) An energy big data management system using a cluster server and a web client, the system comprising: a management node for controlling a cluster server; And a data node for processing and storing the energy big data collected in real time. The management node transmits energy stored in the data node according to the request of the web client, or controls energy Big data management system.

(8) 클러스터 서버와 웹 클라이언트 사이의 정보 및 신호를 전달하는 엣지노드;를 포함하는 에너지 빅데이터 관리 시스템.(8) an edge node for transmitting information and signals between the cluster server and the web client.

(9) 클러스터 서버는 2개의 매니지노드, 6개의 데이터노드 및 2개의 엣지노드로 이루어지는 에너지 빅데이터 관리 시스템.(9) The cluster server is composed of two management nodes, six data nodes, and two edge nodes.

(10) 2개의 매니지노드 중 하나는 메인 매니지노드로 동작하고, 나머지 하는 보조 매니지노드로 동작하는 에너지 빅데이터 관리 시스템.(10) An energy big data management system in which one of two management nodes operates as a main management node and the other operates as an auxiliary management node.

(11) 데이터노드는 매니지노드로부터 에너지 빅데이터를 수신하는 명령신호가 입력된 경우, 엣지노드를 통해 외부 에너지 데이터 및 현장 에너지 데이터를 수신하여 데이터를 정형 데이터 및 비정형 데이터로 분류하고, 분류된 데이터를 분석하여 분야별로 배치하여 저장하는 에너지 빅데이터 관리 시스템.(11) When a command signal for receiving energy big data is input from a management node, the data node receives external energy data and field energy data through the edge node, classifies the data into the form data and the irregular data, And stores them in the field.

(12) 데이터노드는 수신된 에너지 빅데이터를 분석 가능한 데이터로 변환하는 에너지 빅데이터 관리 시스템.(12) The data node is an energy big data management system that converts received energy big data into analytical data.

(13) 데이터노드는 웹 클라이언트의 요청에 따라 매니지노드로부터 데이터노드에 저장된 정보를 웹 클라이언트로 전송하는 명령신호가 입력된 경우, 이 요청에 대응하는 분야별 배치된 에너지 빅데이터를 검색하여 엣지노드를 통해 웹 클라이언트로 해당 정보를 전송하는 에너지 빅데이터 관리 시스템.(13) When a command signal for transmitting the information stored in the data node to the web client is input from the management node according to the request of the web client, the data node searches the energy big data arranged in each field corresponding to the request, Energy big data management system that transmits that information to the web client.

본 개시에 따른 에너지 빅데이터를 관리하는 시스템을 제공하는 방법에 의하면, 에너지 빅데이터를 제공하는 웹 클라이언트의 동작 상태를 클러스터 서버에서 종합적으로 관리하고 제어함으로써, 에너지 빅데이터를 목적에 따라 효율적으로 관리할 수 있다.According to the method of providing the system for managing energy big data according to the present disclosure, the operation status of the web client for providing the energy big data is managed and controlled in the cluster server comprehensively, thereby efficiently managing the energy big data according to the purpose can do.

또한, 클러스터 서버를 이용하여 에너지 빅데이터를 관리함으로써, 데이터 처리, 관리로 데이터 안정성이 향상되고, 소프트웨어, 데이터베이스 업그레이드로 지속적 성능을 향상시킬 수 있다.In addition, by managing energy big data using a cluster server, data stability can be improved by data processing and management, and continuous performance can be improved by software and database upgrade.

그리고, 실시간으로 웹 클라이언트의 동작 상태를 관리하여, 관리 요청 또는 설비의 고장 진단에 실시간으로 대응함으로써, 이를 활용한 에너지정보 기반의 에너지절감 시장에 능동적이고 자발적으로 동참할 수 있는 효과가 있다.In addition, it can actively and voluntarily participate in the energy saving market based on energy information, by managing the operation status of the web client in real time and responding to the management request or the failure diagnosis of the facility in real time.

10 : 클러스터 서버 20 : 웹 클라이언트
100 : 매니지노드 110 : 제1 매니지노드
112 : 제2 매니지노드 120 : 데이터노드
140 : 엣지노드10: Cluster Server 20: Web Client
100: Managed node 110: First managed node
112: second management node 120: data node
140: edge node

Claims

delete

In an energy big data management system using a cluster server and a web client,
A management node that controls the cluster server; And
A data node receiving, analyzing, storing, and managing energy big data generated from a web client,
Energy Big Data includes external energy data and field energy data,
The data node
Receives external energy data in real time,
The energy data management system receives the energy data according to the control signal of the management node.

The method of claim 7,
And an edge node for connecting the cluster server and the web client over a network.

The method of claim 8,
The cluster server is composed of two management nodes, six data nodes, and two edge nodes.

The method of claim 9,
One of the two management nodes operates as a main management node, and the other operates as an auxiliary management node.

The method of claim 7,
The data node includes a data receiving unit, a data analyzing unit, a data storing unit, and a data managing unit.

The method of claim 11,
The data receiving unit receives the energy big data generated from the web client,
The data analysis department converts the formal data and the unstructured data into a programming language using machine learning and scan,
The data storage unit stores data analyzed by the data analysis unit,
The data management unit includes a first management unit for providing information of the data storage unit to the web client in response to a request from the web client, a second management unit for visualizing the information stored in the data storage unit, and a third management unit for managing the cluster server and the web client Energy Big Data Management System.