KR20140007300A

KR20140007300A - System and method for processing sensor stream data based hadoop

Info

Publication number: KR20140007300A
Application number: KR1020130080478A
Authority: KR
Inventors: 권준호; 홍봉희; 전승우; 정도성; 조춘옥; 곽윤식; 양점옥
Original assignee: 부산대학교 산학협력단
Priority date: 2012-07-09
Filing date: 2013-07-09
Publication date: 2014-01-17
Also published as: KR101496011B1

Abstract

A system for processing hadoop-based sensor stream data of the present invention comprises: a plurality of sensor nodes which are randomly distributed over the network to monitor a predetermined state to output stream sensor data; a server node which communicates with the sensor nodes and stores and integrally manages a plurality of events in the network; and a Hadoop distributed file system (HDFS) unit which is connected to the server node to distribute and store data, transmitted from the server node, to a storage area in a predetermined unit according to MapReduce based on Hadoop.

Description

Hadoop-based sensor stream data processing system and method {SYSTEM AND METHOD FOR PROCESSING SENSOR STREAM DATA BASED HADOOP}

본 발명은 분산 기반 센서 스트림 데이터 처리에 관한 것으로, 보다 상세하게는 미리 설정된 기간 동안 수집된 스트림 데이터를 소정 크기의 데이터로 변환해서 맵리듀스(MapReduce) 기반의 대용량 분산 데이터 처리 시스템에서 사용하는 파일 시스템에 전달하는 분산 기반 센서 스트림 데이터 처리 시스템 및 방법에 관한 것이다.The present invention relates to distributed-based sensor stream data processing. More particularly, the present invention relates to a file system for converting stream data collected during a predetermined period into data having a predetermined size and using the same in a MapReduce-based large-scale distributed data processing system. A distribution-based sensor stream data processing system and method for delivering to the same.

스트림 데이터란 연속적으로 발생하는 데이터를 일종의 정보의 흐름(stream)으로 보고 미리 정의된 규칙으로 처리하는 형태의 데이터를 말한다. 센서에서 발생하는 데이터는 스트림 데이터의 일종으로 볼 수 있다.Stream data refers to data in a form of processing data generated continuously as a kind of information stream and processing according to a predefined rule. Data generated by the sensor can be regarded as a kind of stream data.

센서 기술의 발달로 센서의 성능이 향상되고 가격이 내려가므로 인해 산업의 각 분야에 다양한 센서의 사용이 늘어나고 있다. 늘어난 센서의 사용에 따라 수집, 처리, 관리해야하는 센서 스트림 데이터의 양이 빠르게 증가하고 있다. The development of sensor technology improves the performance and lowers the price of the sensor, which is increasing the use of various sensors in various fields of the industry. With the increased use of sensors, the amount of sensor stream data that needs to be collected, processed, and managed is growing rapidly.

최근 대용량의 데이터를 수집, 처리 및 관리를 위하여 많은 기술이 연구되어 왔으며 저비용으로 대규모 클러스터를 구축하여 대용량 데이터 분산 관리 및 작업 분산 병렬 처리하는 미국 구글(Google)사의 맵리듀스(MapReduce)모델이 주목 받고 있다.Recently, many technologies have been studied for collecting, processing, and managing large amounts of data, and the Google's MapReduce model, which constructs large-scale clusters at low cost and manages and distributes large amounts of data in parallel, is attracting attention. have.

상기 맵 리듀스 모델 기반의 분산 병렬 처리 시스템으로는 Google의 맵리듀스 시스템, Apache Software Foundation의 Hadoop 등이 대표적이다.As a distributed parallel processing system based on the map reduce model, Google's map reduce system and Hadoop of Apache Software Foundation are representative.

맵리듀스 기반의 분산 병렬 처리 시스템들은 기본적으로 이미 수집되어 있는 대용량 데이터에 대한 일괄 처리에 적합하도록 구성되어 있다. 연속적으로 수집되는 스트림 데이터에 대한 실시간 처리는 크게 고려되지 않고 있어 이것을 해결하기 위한 시스템이 요구되었다.MapReduce-based distributed parallel processing systems are basically configured to be suitable for batch processing of large data already collected. Real-time processing of stream data continuously collected is not considered so much, and a system for solving this is required.

이러한 기존 맵리듀스 기반의 대용량 분산 데이터 처리 시스템에서는 대용량의 데이터를 효율적으로 처리하기 위하여 큰 덩어리의 데이터를 분산할 때 큰 단위의 저장 공간으로 분할하여 데이터를 저장하였다. 예를 들어 일반적인 컴퓨터 시스템에 사용되는 Windows 운영체제의 파일 시스템에서는 기본 저장 단위로 KB 단위를 사용하는데 Apache Software Foundation의 Hadoop에서 사용하는 HDFS(Hadoop Distributed File System)에서는 기본 저장단위로 64MB가 사용된다. 64MB라는 용량은 한 Packet의 용량이 수십 byte에 불과한 센서 스트림 데이터를 다루는데 매우 비효율적이며 빈번한 파일시스템에 대한 접근은 시스템 전체의 성능을 떨어트리게 되는 문제가 발생한다.In such a large-scale distributed data processing system based on the MapReduce, in order to efficiently process a large amount of data, a large chunk of data is divided into large storage spaces to store data. For example, the file system of the Windows operating system used for a typical computer system uses KB as the default storage unit, while the Hadoop Distributed File System (HDFS) used by the Apache Software Foundation's Hadoop uses 64 MB as the default storage unit. The capacity of 64MB is very inefficient in dealing with sensor stream data, where a packet is only tens of bytes in size. Frequent access to the file system causes a problem of degrading the performance of the entire system.

따라서 본 발명은 이러한 문제를 해결하기 위해서 본 발명에서는 일정 기간 스트림 데이터를 모아서 큰 덩어리 형태의 데이터로 변환해서 맵리듀스 기반의 대용량 분산 데이터 처리 시스템에서 사용하는 파일 시스템에 전달하는 분산 기반 센서 스트림 데이터 입력 시스템을 제안한다.Therefore, in order to solve this problem, the present invention collects stream data for a certain period, converts the data into large chunks, and distributes the distributed sensor stream data input to the file system used in the MapReduce-based mass distributed data processing system. Suggest a system.

본 발명의 일 견지에 따르면, 네트워크 내 무작위로 분산되어 소정의 상태를 모니터링하여 스트림 센서 데이터를 출력하는 센서 노드와, 다수의 센서 노드와 교신하며, 상기 네트워크 내 다수의 이벤트를 저장 및 통합 관리하는 서버 노드와, 상기 서버 노드와 연동되어 서버 노드로부터 전송되는 데이터를 하둡(Hadoop)에서 맵리듀스(MapReduce)에 따라 기설정된 단위의 저장 영역으로 분산하여 저장하는 하둡 분산 파일 시스템(Hadoop Distributed File System: HDFS)부를 포함함을 특징으로 포함함을 특징으로 한다.According to an aspect of the present invention, a sensor node that is randomly distributed in a network and monitors a predetermined state and outputs stream sensor data, communicates with a plurality of sensor nodes, and stores and integrates a plurality of events in the network. Hadoop Distributed File System for distributing and storing data transmitted from the server node in connection with the server node to a storage area of a predetermined unit according to MapReduce from Hadoop. And HDFS) part.

본 발명의 다른 견지에 따르면, 다수의 센서 노드를 통합 괸리하는 서버 노드에서 센서 노드별 발생하는 스트림 센서 데이터를 기설정된 주기별로 수집하는 과정과, 수집된 상기 스트림 센서 데이터에 대한 파일 볼륨을 분할하여 네트워크를 통해 연동된 하둡 분산 파일 시스템(Hadoop Distributed File System: HDFS)부에 저장하기 위한 파일 블록들로 구성된 청크(chunk)를 생성하는 과정과, 생성된 상기 청크를 상기 HDFS부에 저장 요청하고, 해당 청크의 저장 결과를 수신하는 과정과,상기 서버 노드와 연동되어 서버 노드로부터 전송되는 데이터를 하둡에서 맵리듀스에 따라 기설정된 단위의 저장 영역으로 분산하여 저장하는 과정을 포함함을 특징으로 한다.According to another aspect of the present invention, a process of collecting stream sensor data generated for each sensor node at predetermined intervals in a server node integrating and managing a plurality of sensor nodes, and dividing a file volume for the collected stream sensor data Generating a chunk composed of file blocks for storing in a Hadoop Distributed File System (HDFS) unit interworked over a network, requesting to store the generated chunk to the HDFS unit, Receiving a result of storing the chunk; and distributing and storing data transmitted from the server node in connection with the server node to a storage area of a predetermined unit according to map reduction in Hadoop.

본 발명에 따른 하둡 기반 센서 스트림 데이터 처리 시스템 및 방법은 하기와 같은 효과를 기대할 수 있다.Hadoop-based sensor stream data processing system and method according to the present invention can expect the following effects.

첫째, 실시간에 가까운 고속 데이터 처리를 수행할 수 있다.First, high-speed data processing close to real time can be performed.

둘째, 연속적으로 수집되는 스트림에 대한 처리를 수행할 수 있다.Secondly, processing can be performed on the streams collected continuously.

셋째, 대용량 스트림 데이터에 대한 처리를 수행할 수 있다.Third, processing of a large amount of stream data can be performed.

도 1은 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템의 개략적인 구성도.
도 2는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템(200)에서 서버 노드에 관한 상세 블록도.
도 3은 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 방법에 관한 흐름도.
도 4는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 방법에서 서버 노드에서의 동작 흐름도.
도 5는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템에 적용된 연속질의 색인 기술의 처리 흐름도.1 is a schematic block diagram of a Hadoop-based sensor stream data processing system according to an embodiment of the present invention.
2 is a detailed block diagram of a server node in a Hadoop-based sensor stream data processing system 200 according to an embodiment of the present invention.
3 is a flowchart illustrating a Hadoop-based sensor stream data processing method according to an embodiment of the present invention.
4 is a flowchart illustrating operations at a server node in a Hadoop based sensor stream data processing method according to an embodiment of the present invention.
5 is a process flow diagram of a continuous indexing technique applied to a Hadoop based sensor stream data processing system according to an embodiment of the present invention.

이하 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기 설명에서는 구체적인 구성 소자 등과 같은 특정 사항들이 나타나고 있는데 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐 이러한 특정 사항들이 본 발명의 범위 내에서 소정의 변형이나 혹은 변경이 이루어질 수 있음은 이 기술 분야에서 통상의 지식을 가진 자에게는 자명하다 할 것이다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It will be appreciated that those skilled in the art will readily observe that certain changes in form and detail may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims. To those of ordinary skill in the art.

본 발명은 하둡(Hadoop) 기반 센서 스트림 데이터 처리에 관한 것으로, 더욱 상세하게는 실시간 센서 스트림 데이터를 처리하는 컴포넌트와 이력 센서 데이터의 통계 분석을 지원하는 컴포넌트를 구비하고, 네트워크 내 분산된 센서 노드별 스트림 센서 데이터를 기설정된 주기별로 수집하여 수집된 스트림 센서 데이터에 대한 파일 볼륨을 분할하여 파일 블록들로 구성된 청크를 생성하고, 생성된 청크를 하둡 분산 파일 시스템(Hadoop Distributed File System; HDFS)에 저장함으로써 빈번한 파일 시스템으로의 접근으로 인한 서버의 성능 저하를 개선하고, 상기 HDFS로부터 센서 이력데이터를 가져와서 맵리듀스를 이용하여 데이터 처리 및 분석을 수행함으로써 상기 두 컴포넌트로부터 획득된 결과를 겹합하여 현 분산 센서 스트림 데이터 처리 시스템에 대한 상황 분석 및 상태 평가가 가능한 기술을 제공하고자 한다.
The present invention relates to Hadoop-based sensor stream data processing, and more particularly, includes a component for processing real-time sensor stream data and a component for supporting statistical analysis of historical sensor data, and for each distributed sensor node in a network. Collects stream sensor data at predetermined intervals, divides the file volume for the collected stream sensor data, generates chunks of file blocks, and stores the generated chunks in the Hadoop Distributed File System (HDFS). This improves the performance degradation of the server due to frequent file system access, and imports the sensor history data from the HDFS and performs data processing and analysis using MapReduce to overlap the results obtained from the two components and distribute the current. Situation for sensor stream data processing system To provide a seating and state assessment technologies available.

그리고 본 발명의 실시 예에 따른 상기 서비스 노드는 바람직하게는 네트워크를 통하여 서버와 통신 가능한 모든 정보통신기기 및 멀티미디어 기기와, 그에 대한 응용에도 적용될 수 있음은 자명할 것이다.
In addition, it will be apparent that the service node according to an embodiment of the present invention may be applied to all information communication devices and multimedia devices that can communicate with a server through a network, and applications thereof.

이하, 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템에 대해 도 1을 참조하여 자세히 살펴보기로 한다.Hereinafter, a Hadoop-based sensor stream data processing system according to an embodiment of the present invention will be described in detail with reference to FIG. 1.

도 1은 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템의 개략적인 구성도이다.1 is a schematic block diagram of a Hadoop-based sensor stream data processing system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명이 적용된 시스템(100)은 다수의 센서 노드(110, 112, 114), 서버 노드(116) 및 하둡 분산 파일 시스템(Hadoop Distributed File System: HDFS)부(118)을 포함한다.Referring to FIG. 1, the system 100 to which the present invention is applied includes a plurality of sensor nodes 110, 112, and 114, a server node 116, and a Hadoop Distributed File System (HDFS) unit 118. Include.

소정의 가축(110), 가축 식별부(112, 114), 관리 서버(116) 및 다수의 서비스 노드(118, 120, 122)를 포함한다.It includes a predetermined livestock 110, livestock identification unit 112, 114, management server 116 and a plurality of service nodes (118, 120, 122).

상기 센서 노드(110, 112, 114)는 네트워크 내 무작위로 분산되어 소정의 상태를 모니터링하여 스트림 센서 데이터를 출력한다.The sensor nodes 110, 112, and 114 are randomly distributed in a network to monitor a predetermined state and output stream sensor data.

상기 서버 노드(116)은 다수의 센서 노드(110, 112, 114)와 네트워크를 통해 교신하며, 상기 네트워크 내 다수의 이벤트를 저장 및 통합 관리한다.The server node 116 communicates with a plurality of sensor nodes 110, 112, and 114 via a network, and stores and integrates a plurality of events in the network.

상기 HDFS(118)은 상기 서버 노드(116)와 연동되어 서버 노드(116)로부터 전송되는 데이터를 하둡에서 맵리듀스에 따라 기설정된 단위의 저장 영역으로 분산하여 저장한다.The HDFS 118 works in conjunction with the server node 116 to store data transmitted from the server node 116 in a storage area of a predetermined unit according to map reduction in Hadoop.

더욱 상세하게 도 2를 참조하여 설명하며, 도 2는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템(200)에서 서버 노드에 관한 상세 블록도에 관한 것으로, 상기 서버 노드(212)는 데이터 수집부(214), 청크 생성부(216), 청크 처리 요청부(218), 데이터 전송부(220), 제어부(222), 임시 데이터 저장부(224), 데이터 어댑터 레이어부(226), 데이터 수집 레이어부(228) 및 데이터 처리 레이어부를 포함한다.2 will be described in more detail. FIG. 2 is a detailed block diagram of a server node in the Hadoop-based sensor stream data processing system 200 according to an exemplary embodiment of the present invention. The data collector 214, the chunk generator 216, the chunk processing request unit 218, the data transmitter 220, the controller 222, the temporary data storage unit 224, and the data adapter layer unit 226. And a data collection layer unit 228 and a data processing layer unit.

우선, 개략적으로 본 발명에 적용된 서버 노드(212)를 살펴보면, 제어부(222)를 기준으로 실시간 스트림 센서 스트림 데이터를 처리하는 컴포넌트와 이력 센서 데이터의 통계 분석을 지원하는 컴포넌트로 구성되며, 데이터 수집부(214)는 실시간으로 스트림 센서 데이터를 수집하며, 수집된 데이터양을 실시간으로 체크하여 특정 요구 사항들에 충족되면 데이터 처리부(230)의 실시간 데이터 처리부(20) 및 이력 데이터 처리부(21)로 전송을 수행한다. 제어부(222)는 수집된 스트림 센서 데이터를 사용자가 지정한 일정한 크기 또는 일정한 시간이 될 때까지 스트림 센서 데이터의 크기를 체크하는 동작을 수행한다.First, referring to the server node 212 schematically applied to the present invention, the control unit 222 is composed of a component for processing real-time stream sensor stream data and a component for supporting statistical analysis of the historical sensor data, the data collection unit 214 collects the stream sensor data in real time, and checks the collected data amount in real time and transmits the data to the real time data processor 20 and the historical data processor 21 of the data processor 230 when specific requirements are met. Do this. The controller 222 performs an operation of checking the size of the stream sensor data until the collected stream sensor data reaches a predetermined size or a predetermined time designated by the user.

상기 이력 데이터 처리부(21)는 실시간 데이터 처리부(230)로부터 전송받은 데이터를 전송받아 64MB 크기가 될 때까지 데이터를 임시 저장한 후 상기 HDFS부(210)에 저장되도록 전송한다.The history data processor 21 receives the data received from the real-time data processor 230 and temporarily stores the data until the size becomes 64MB, and transmits the data to be stored in the HDFS unit 210.

상기 이력 데이터 처리부(21)는 HDFS부(210)로부터 스트림 센서 이력 데이터를 가져와서 맵리듀스를 이용하여 데이터 처리 및 분석을 수행한다.The history data processor 21 obtains stream sensor history data from the HDFS unit 210 and performs data processing and analysis using MapReduce.

더욱 상세하게는, 상기 서버 노드(212)는, 상기 센서 노드별 발생하는 스트림 센서 데이터를 기설정된 주기별로 수집하는 데이터 수집부(214)와, 상기 데이터 수집부(214)로부터 수집된 스트림 센서 데이터에 대한 파일 볼륨을 분할하여 상기 HDFS부(210)에 저장하기 위한 파일 블록들로 구성된 청크(chunk)를 생성하는 청크 생성부(216)와, 상기 청크 생성부(218)로부터 생성된 청크를 상기 HDFS부(210)에 저장 요청하고, 해당 청크의 저장 결과를 수신하는 청크 처리 요청부(218)를 포함한다.More specifically, the server node 212 may include a data collector 214 for collecting stream sensor data generated for each sensor node at predetermined intervals, and stream sensor data collected from the data collector 214. The chunk generator 216 generates a chunk composed of file blocks for dividing a file volume for the HDFS unit 210 and stores the chunk generated from the chunk generator 218. It includes a chunk processing request unit 218 to request the storage of the HDFS unit 210, and receives the storage result of the chunk.

상기 청크 생성부(218)는 일반적으로, 다수의 센서 기기들로부터 발생하는 센서 스트림 데이터를 서버 시스템에서 수집하여 실시간적으로 저장하는 것은 빈번한 파일 시스템 접근으로 인한 서버의 성능 저하 문제를 야기하기 때문에, 스트림 센서 데이터의 집중 하중(Concentrated-load)문제를 해결하기 위해서 100bytes 가량의 스트림 센서 데이터들을 청크(chunks) 파일 단위로 저장하기 위한 것이다.Since the chunk generator 218 generally collects sensor stream data generated from a plurality of sensor devices in a server system and stores the data in real time, the chunk generator 218 may cause a performance degradation of a server due to frequent file system access. In order to solve the concentrated-load problem of stream sensor data, the stream sensor data of about 100 bytes is stored in chunks.

즉, 상기 데이터 수집부(214)를 통해 일정 주기 동안 수집된 스트림 센서 데이터들을 상기 청크 생성부(216)에서 하나의 청크 파일로 변형시킨다.That is, the chunk generating unit 216 transforms the stream sensor data collected during the predetermined period through the data collecting unit 214 into one chunk file.

이때, 상기 제어부(222)는 손실되는 데이터를 방지하기 위하여 스트림 센서 데이터를 저장 관리하도록 제어하며, 해당 스트림 센서 데이터를 상기 HDFS부(210)에 저장하기 위해 상기 청크 처리 요청부(218)를 통해 생성된 청크 파일의 저장 요청을 수행하도록 제어하고, 상기 데이터 전송부(220)를 통해 청크 파일을 상기 HDFS부(210)으로 전송한다.In this case, the control unit 222 controls to store and manage the stream sensor data in order to prevent lost data, and through the chunk processing request unit 218 to store the stream sensor data in the HDFS unit 210. Control to perform the storage request of the generated chunk file, and transmits the chunk file to the HDFS unit 210 through the data transmitter 220.

계속해서, 상기 서버 노드(212)는 상기 센서 노드별 수집된 스트림 센서 로(raw) 데이터를 파싱하여 하둡에서 맵리듀스에 기반한 데이터의 분산 병렬처리를 보장하기 위한 데이터로 변환하는 데이터 어댑터 레이어부(226)와, 상기 데이터 어댑터 레이어부(226)를 통하여 전달된 센서 노드별 스트림 센서 데이터를 서버 노드 내에서 중계 및 상기 HDFS부(210)에 저장 요청하는 데이터 수집 레이어부(228)와,사용자로부터 기등록된 질의를 관리하고, 질의가 주어지면 해당 질의를 분석하여 연속질의 인덱스를 생성하고, 상기 스트림 센서 데이터를 수신하여 질의를 처리하는 데이터 처리 레이어부(230)을 포함한다.Subsequently, the server node 212 parses the collected stream sensor raw data for each sensor node and converts the data into a data adapter to guarantee distributed parallel processing of data based on map reduction in Hadoop ( 226 and a data collection layer unit 228 for requesting to store the sensor data for each sensor node transmitted through the data adapter layer unit 226 in the server node and to the HDFS unit 210, and from the user. It includes a data processing layer 230 for managing a pre-registered query, and if a query is given, analyzes the query to generate a continuous query index, and receives the stream sensor data to process the query.

더욱 상세하게는, 상기 데이터 어댑터 레이어부(226)은 센서 기기들로부터 발생하는 Raw 센서 스트림 데이터를 시스템에서 처리할 수 있는 형식으로 변환해 주는 동작을 수행한다.More specifically, the data adapter layer unit 226 converts the raw sensor stream data generated from the sensor devices into a format that can be processed by the system.

상기 데이터 수집 레이어부(228)은 상기 데이터 어댑터 레이어부(226)를 통하여 전달된 모든 센서 데이터들을 데이터베이스로 저장하는 역할을 수행하는 것과 동시에 데이터 처리 레이어부(230)로 전송해주어 실시간 데이터 처리도 가능하게 한다.The data collection layer unit 228 stores all sensor data transferred through the data adapter layer unit 226 in a database and transmits the data to the data processing layer unit 230 in real time. Let's do it.

또한, 상기 데이터 수집 레이어부(228)은 데이터 어댑터 레이어부(226)에서 받아 오는 데이터를 임시 저장 처리하는 동작을 수행하고, 실시간으로 들어오는 스트림 센서 데이터를 연속 질의 처리를 위해 데이터를 수신하고, 받아 오는 데이터를 HDFS부(210)에 저장하기 위해 임시 저장 처리하는 동작을 수행한다.In addition, the data collection layer unit 228 performs an operation of temporarily storing data received from the data adapter layer unit 226, and receives data for continuous query processing of stream sensor data received in real time. In order to store the coming data in the HDFS unit 210, the temporary storage process is performed.

즉, 하둡 기반으로 대용량 데이터를 처리 및 저장하기 위해서 64MB 단위로 임시 데이터 저장부(224)에 저장하고, 스트림 센서 데이터를 저장함으로써 실시간 통계 정보를 추출해 낼 수 있고, 스트림 센서 데이터의 손실률도 줄일 수 있다.That is, in order to process and store a large amount of data based on Hadoop, the data is stored in the temporary data storage unit 224 in units of 64 MB, and the stream sensor data is stored to extract real-time statistical information, and the loss rate of the stream sensor data can be reduced. have.

상기 데이터 처리 레이어부(230)는 실시간 데이터 처리부(20)을 통해 연속 질의를 처리하는 동작을 수행한다. 사용자가 등록한 질의를 관리하며, 대용량 스트림 센서 데이터 처리에도 빠른 질의 응답 시간을 보장해주기 위해서 연속질의 인덱스를 구축하여 효율적인 처리를 수행한다.The data processing layer unit 230 performs an operation of processing a continuous query through the real-time data processing unit 20. User-registered queries are managed, and a continuous query index is built to ensure fast query response time even for mass stream sensor data processing.

여기서, 도 5를 참조하면, 도 5는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템에 적용된 연속질의 색인 기술의 처리 흐름을 보인 것으로, 상기 데이터 처리 레이어부(230)은 질의 처리 시 연속적으로 수집되는 스트림 센서 데이터에서 특정 컬럼 검색을 위한 에이브이엘(AVL) 트리를 적용하여 1단계 인덱스를 제공한다.Here, referring to FIG. 5, FIG. 5 illustrates a processing flow of a continuous indexing technology applied to a Hadoop-based sensor stream data processing system according to an embodiment of the present invention. The data processing layer unit 230 processes a query. A first-level index is provided by applying AVL tree to search for a specific column in stream sensor data collected continuously.

상기 AVL 트리를 이용하며 탐색 속도가 빠르기 때문에 연속적으로 수집되는 센서 데이터에서 특정 컬럼(sensor_id)를 빠르게 찾을 수 있게 하기 위해 본 발명에서는 1단계 색인으로 AVL 트리를 적용한다.Since the AVL tree is used and the search speed is fast, the present invention applies the AVL tree as a one-step index to quickly find a specific column (sensor_id) in the continuously collected sensor data.

1단계 색인을 구축한 후, building의 통계 정보를 계산하고, R-Tree를 이용하여 제2단계 인덱스를 제공한다.After building the 1st stage index, calculate building information and provide the 2nd stage index using R-Tree.

즉, R-Tree는 다 차원 색인 기술자에 가장 대표적인 기법으로서, 센서의 다양한 필드에 맞도록 고차원으로 구성된 연속질의 색인을 구축할 수 있기 때문에 본 발명에서는 2단계 색인으로 R-Tree를 적용한다.In other words, R-Tree is the most typical technique for multi-dimensional index descriptors. In this invention, R-Tree is applied as a two-stage index because it is possible to construct a high-quality index composed of high dimensions to fit various fields of a sensor.

이와 같은 연속질의 색인 기술을 이용한 연산 수행 후 전달되는 스트림 센서 데이터와 일치하는 연속질의를 보다 효율적으로 찾아내고, 매치된 통계 정보를 사용하여 질의의 최종 결과를 결합하고 사용자에게 전송한다.After the operation using the continuous indexing technique, the continuous query matching the stream sensor data delivered is found more efficiently. The final result of the query is combined and transmitted to the user using the matched statistical information.

또한, 상기 데이터 처리 레이어부(230)은 이력 데이터 처리부(21)를 통해 HDFS부(210)로부터 스트림 센서 데이터를 전송받고 맵리듀스 모델을 사용하여 이전 상황에 대한 통계 분석을 지원한다.In addition, the data processing layer unit 230 receives stream sensor data from the HDFS unit 210 through the historical data processor 21 and supports statistical analysis of the previous situation by using a map reduce model.

즉, 상기 이력 데이터 처리부(21)을 통해 네트워크를 통해 연동된 HDFS부(210)로부터 기저장된 스트림 센서 데이터의 이력정보를 수신하고, 상기 이력정보를 기반으로 통계분석 알고리즘을 적용하여 분석된 스트림 센서 데이터에 대응하는 맵리듀스 모델링을 생성하고 이를 실시간 스트림 센서 데이터와 결합하여 상기 HDFS부(210)로 전송한다.That is, the stream sensor analyzed by receiving historical information of pre-stored stream sensor data from the HDFS unit 210 interworked through a network through the history data processor 21 and applying a statistical analysis algorithm based on the history information. The map reduce modeling corresponding to the data is generated and combined with the real time stream sensor data and transmitted to the HDFS unit 210.

계속해서, 상기 서버 노드(212)는, 제어부(222)를 통해 서버 노드(212) 내의 전반적인 동작을 제어하며, 센서 노드 수, 스트림 센서 데이터 기설정된 주기, 스트림 센서 데이터 크기 및 네트워크 내 센서 허브 수를 인자로 하여 수집된 스트림 센서 데이터 크기를 연산하고, 연산 결과를 기정의된 스트림 센서 데이터와 비교하여 상기 기정의된 스트림 센서 데이터와 동일한 경우 데이터 처리 레이어부로 출력하고, 동일하지 않은 경우 기설정된 스트림 센서 데이터 수집 대기 시간 내에서 상기 기정의된 스트림 센서 데이터와의 비교 연산을 수행하도록 제어한다.Subsequently, the server node 212 controls the overall operation in the server node 212 through the control unit 222, and the number of sensor nodes, the predetermined period of the stream sensor data, the size of the stream sensor data, and the number of sensor hubs in the network. Calculates the size of the collected stream sensor data by using as a factor, compares the result of the operation with the predefined stream sensor data, and outputs the same to the data processing layer unit if it is the same as the predefined stream sensor data; Control to perform a comparison operation with the predefined stream sensor data within a sensor data collection wait time.

보다 상세하게는, 상기 제어부(222)는 응용 프로그램 관리 레이어의 Device Manager를 이용해서 네트워크 내 센서 노드를 등록하고, 응용 프로그램 관리 레이어의 System Manager를 통하여 일정 시간 간격 동안 센서 데이터의 생성 비율을 우선적으로 측정하고 전송할 데이터의 크기를 정의한다.More specifically, the controller 222 registers sensor nodes in a network using a device manager of an application management layer, and preferentially generates a generation rate of sensor data for a predetermined time interval through a system manager of an application management layer. Define the size of the data to be measured and transmitted.

그리고 센서 노드들로부터 수집된 센서 스트림 데이터는 데이터 수집 레이어부로 전송된다. 이후, 센서 노드 수, 데이터 수집 주기, 센서 데이터 크기 및 센서 허브 수 등을 인자로 하여 수집된 데이터 크기를 측정한다. The sensor stream data collected from the sensor nodes is transmitted to the data collection layer unit. Then, the collected data size is measured based on the number of sensor nodes, the data collection period, the sensor data size, and the number of sensor hubs.

이때, 데이터 크기가 정의되는 크기랑 동일 한다면 곧바로 결합 단계로 옮겨간다. 그렇지 않다면 정의된 데이터 수집 대기 시간 내에 계속 비교연산을 수행한다. 데이터 수집 대기 시간이 초과하였다면 결합 단계로 옮겨가고, 그렇지 않다면 센서 스트림 수집을 수행한다. 실시간 센서 스트림 데이터 처리 컴포넌트와 이력 센서 데이터 통계 분석 컴포넌트로부터 나온 결과 데이터들을 결합하고 데이터를 HDFS로 전송한다.
In this case, if the data size is the same as the defined size, it is immediately moved to the combining step. If not, the comparison operation continues within the defined data collection latency. If the data collection wait time has been exceeded, move to the combining step; otherwise, perform sensor stream collection. It combines the resulting data from the real-time sensor stream data processing component and the historical sensor data statistical analysis component and sends the data to HDFS.

이상 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 시스템의 구성을 살펴보았다.The configuration of the Hadoop-based sensor stream data processing system according to an embodiment of the present invention has been described above.

이하에서는, 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 방법에 관해 도 3 및 4를 참조하여 자세히 살펴보기로 한다.Hereinafter, a method of processing Hadoop-based sensor stream data according to an embodiment of the present invention will be described in detail with reference to FIGS. 3 and 4.

도 3은 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 방법에 관한 흐름도이다.3 is a flowchart illustrating a Hadoop-based sensor stream data processing method according to an embodiment of the present invention.

도 3을 참조하면, 310 과정에서는 다수의 센서 노드를 통합 괸리하는 서버 노드에서 센서 노드별 발생하는 스트림 센서 데이터를 기설정된 주기별로 수집한다.Referring to FIG. 3, in step 310, stream sensor data generated for each sensor node is collected at predetermined periods in a server node integrating and managing a plurality of sensor nodes.

이때, 상기 센서 노드별 수집된 스트림 센서 로(raw) 데이터를 파싱하여 하둡에서 맵리듀스에 기반한 데이터의 분산 병렬처리를 보장하기 위한 데이터로 변환이 수행된다.At this time, the stream sensor raw data collected for each sensor node is parsed and converted into data to ensure distributed parallel processing of data based on map reduction in Hadoop.

312 과정에서는 수집된 상기 스트림 센서 데이터에 대한 파일 볼륨을 분할하여 네트워크를 통해 연동된 하둡 분산 파일 시스템(Hadoop Distributed File System: HDFS)부에 저장하기 위한 파일 블록들로 구성된 청크(chunk)를 생성한다.In step 312, a file volume for the collected stream sensor data is divided to generate a chunk of file blocks for storing in a Hadoop Distributed File System (HDFS) unit connected through a network. .

이때, 상기 청크 생성은, 일반적으로 다수의 센서 기기들로부터 발생하는 센서 스트림 데이터를 서버 시스템에서 수집하여 실시간적으로 저장하는 것은 빈번한 파일 시스템 접근으로 인한 서버의 성능 저하 문제를 야기하기 때문에, 스트림 센서 데이터의 집중 하중(Concentrated-load)문제를 해결하기 위해서 100bytes 가량의 스트림 센서 데이터들을 청크(chunks) 파일 단위로 저장하기 위한 것이다.In this case, the chunk generation is a stream sensor because in general collecting and storing sensor stream data generated from a plurality of sensor devices in a server system in real time causes a performance degradation of a server due to frequent file system access. In order to solve the concentrated-load problem of data, the stream sensor data of about 100 bytes is stored in chunks.

314 과정에서는 생성된 상기 청크를 상기 HDFS부에 저장 요청하고, 316 과정에서는 해당 청크의 저장 결과를 수신한다.In step 314, the generated chunk is requested to be stored in the HDFS. In step 316, the chunk is stored.

이후, 사용자로부터 질의가 주어지면 해당 질의를 분석하여 연속질의 인덱스를 생성하고, 상기 스트림 센서 데이터를 수신하여 질의를 처리한다.Subsequently, when a query is given from the user, the query is analyzed to generate an index of continuous query, and the stream sensor data is received to process the query.

318 과정에서는 상기 서버 노드와 연동되어 서버 노드로부터 전송되는 데이터를 하둡에서 맵리듀스에 따라 기설정된 단위의 저장 영역으로 분산하여 저장한다.In step 318, the data transmitted from the server node in cooperation with the server node is distributed and stored in Hadoop in a storage area of a predetermined unit according to map reduction.

이후, 네트워크를 통해 연동된 하둡 분산 파일 시스템(Hadoop Distributed File System: HDFS)부로부터 기저장된 스트림 센서 데이터의 이력정보를 수신하고, 상기 이력정보를 기반으로 통계분석 알고리즘을 적용하여 분석된 스트림 센서 데이터에 대응하는 맵리듀스 모델링을 생성하여 생성된 맵리듀스 모델링을 실시간 스트림 센서 데이터와 결합하여 상기 HDFS부로 전송한다.Subsequently, stream sensor data analyzed by receiving historical information of prestored stream sensor data from a Hadoop Distributed File System (HDFS) unit interworked through a network, and applying a statistical analysis algorithm based on the history information. The map reduce modeling corresponding to the generated map reduce modeling is combined with real-time stream sensor data and transmitted to the HDFS unit.

여기서, 도 4를 참조하면, 도 4는 본 발명의 일 실시 예에 따른 하둡 기반 센서 스트림 데이터 처리 방법에서 서버 노드에서의 동작 흐름을 보인 것으로, 410 과정에서는 응용 프로그램 관리 레이어의 Device Manager를 이용해서 센서 노드를 등록한다.Referring to FIG. 4, FIG. 4 illustrates an operation flow of a server node in the Hadoop-based sensor stream data processing method according to an embodiment of the present invention. In step 410, the device manager of the application management layer is used using a device manager. Register the sensor node.

412 과정에서는 응용 프로그램 관리 레이어의 System Manager를 통하여 일정 시간 간격 동안 센서 데이터의 생성 비율을 우선적으로 측정하고 전송할 데이터의 크기를 정의한다.In step 412, the generation rate of sensor data is first measured through a system manager of an application management layer and a size of data to be transmitted is defined.

414 과정에서는 센서 기기들로부터 스트림 센서 데이터를 수집하고, 상기 수집된 센서 스트림 데이터는 서버 노드 내 데이터 수집 레이어로 전송된다.In step 414, stream sensor data is collected from sensor devices, and the collected sensor stream data is transmitted to a data collection layer in a server node.

이때, 416 과정에서는 센서 노드 수, 데이터 수집 주기, 센서 데이터 크기 및 센서 허브 수 등을 인자로 하여 수집된 데이터 크기를 측정한다. In step 416, the collected data size is measured based on the number of sensor nodes, the data collection period, the sensor data size, and the number of sensor hubs.

측정 결과, 데이터 크기가 정의되는 크기랑 동일한 경우 420 과정으로 이동하고, 동일하지 않은 경우 418 과정으로 이동하여 정의된 데이터 수집 대기 시간 내에 계속 비교연산을 수행한다.As a result of the measurement, when the data size is the same as the defined size, the process moves to step 420, and if it is not the same, the process proceeds to step 418 to continuously perform comparison operation within the defined data collection waiting time.

데이터 수집 대기 시간이 초과 하였다면 422 과정의 결합 단계로 이동하고, 초과하지 않은 경우 414 과정으로 이동하여 이후 동작 과정을 수행한다. If the data collection waiting time is exceeded, the process moves to the combining step of step 422, and if not, the process moves to step 414 to perform a subsequent operation process.

422 과정에서는 실시간 센서 스트림 데이터 처리 컴포넌트와 이력 센서 데이터 통계 분석 컴포넌트로부터 나온 결과 데이터들을 결합하고 424 과정에서 데이터를 HDFS로 전송한다.
In step 422, the result data from the real-time sensor stream data processing component and the historical sensor data statistical analysis component are combined. In step 424, the data is transmitted to the HDFS.

상기와 같이 본 발명에 따른 하둡 기반 센서 스트림 데이터 처리 방법 및 시스템에 관한 동작이 이루어질 수 있으며, 한편 상기한 본 발명의 설명에서는 구체적인 실시 예에 관해 설명하였으나 여러 가지 변형이 본 발명의 범위를 벗어나지 않고 실시될 수 있다. 따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 청구범위와 청구범위의 균등한 것에 의하여 정하여져야 할 것이다.As described above, operations related to the Hadoop-based sensor stream data processing method and system according to the present invention can be made. Meanwhile, the above-described description of the present invention has been described with reference to specific embodiments, but various modifications can be made without departing from the scope of the present invention. Can be implemented. Accordingly, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by equivalents of the claims and the claims.

214: 데이터 수집부 216: 청크 생성부
218: 청크 처리 요청부 220: 데이터 전송부
222: 제어부 224: 임시 데이터 저장부
226: 데이터 어댑터 레이어부 228: 데이터 수집 레이어부
230: 데이터 처리 레이어부 224: 임시 데이터 저장부214: data collector 216: chunk generator
218: Chunk processing request unit 220: Data transmission unit
222: control unit 224: temporary data storage unit
226: data adapter layer portion 228: data collection layer portion
230: data processing layer unit 224: temporary data storage unit

Claims

A sensor node that is randomly distributed in a network and monitors a predetermined state and outputs stream sensor data;
A server node communicating with a plurality of sensor nodes and storing and managing a plurality of events in the network;
Includes Hadoop Distributed File System (HDFS) unit for distributing the data transmitted from the server node in conjunction with the server node to a storage area of a predetermined unit in accordance with MapReduce from Hadoop (MapReduce) Hadoop-based sensor stream data processing system.

The method of claim 1, wherein the server node,
A data collector configured to collect stream sensor data generated for each sensor node at predetermined intervals;
A chunk generating unit generating a chunk composed of file blocks for dividing a file volume for stream sensor data collected from the data collecting unit and storing the divided file volume in the HDFS unit;
Hadoop-based sensor stream data processing system, characterized in that it comprises a chunk processing request unit for storing the chunk generated from the chunk generation unit to the HDFS, and receives the storage result of the chunk.

The method of claim 1, wherein the server node,
A data adapter layer unit for parsing the collected stream sensor raw data for each sensor node and converting the raw data into data for guaranteeing distributed parallel processing based on map reduce in Hadoop;
A data collection layer unit requesting to store the stream sensor data for each sensor node transmitted through the data adapter layer unit in the server node and to the HDFS unit;
Hadoop-based sensor, characterized in that it comprises a data processing layer for managing a pre-registered query from a user, and if a query is given to analyze the query to generate a continuous index, and receives the stream sensor data to process the query Stream data processing system.

The method of claim 3, wherein the data processing layer unit,
Receive history information of pre-stored stream sensor data from the Hadoop Distributed File System (HDFS) unit interworking through a network, and correspond to the analyzed stream sensor data by applying a statistical analysis algorithm based on the history information. Hadoop-based sensor stream data processing system for generating a MapReduce modeling and combines it with real-time stream sensor data and transmits it to the HDFS unit.

The method of claim 3, wherein the server node,
It controls the overall operation in the server node, calculates the collected stream sensor data size based on the number of sensor nodes, the preset period of the stream sensor data, the stream sensor data size, and the number of sensor hubs in the network, and calculates the calculation result. Compared with the stream sensor data and outputs the same to the predefined stream sensor data to the data processing layer unit,
And a control unit which controls to perform a comparison operation with the predefined stream sensor data within a preset stream sensor data collection waiting time, if not identical.

The method of claim 3, wherein the data processing layer unit,
Provide a first-level index by applying AVL tree for searching a specific column in stream sensor data collected continuously during query processing.
Hadoop-based sensor stream data processing system, characterized in that for calculating the statistical information based on the first stage index, and providing a second stage index using the R-Tree.

Collecting stream sensor data generated for each sensor node at predetermined intervals in a server node integrating and managing a plurality of sensor nodes;
Dividing a file volume of the collected stream sensor data to generate a chunk of file blocks for storing in a Hadoop Distributed File System (HDFS) unit connected through a network;
Requesting the generated chunk to be stored in the HDFS unit and receiving a result of storing the chunk;
Hadoop-based sensor stream data processing method comprising the step of distributing the data transmitted from the server node in conjunction with the server node in a storage area of a predetermined unit in accordance with the map reduce in Hadoop.

The method of claim 7, wherein
Parsing the collected stream sensor raw data for each sensor node and converting the raw data into data for guaranteeing distributed parallel processing of data based on map reduction in Hadoop;
If a query is given from a user, the method further comprises the step of analyzing the query to generate a continuous index, and receiving the stream sensor data to process the query.

The method of claim 7, wherein
Receiving history information of pre-stored stream sensor data from a Hadoop Distributed File System (HDFS) unit linked through a network;
Generating a map reduce modeling corresponding to the analyzed stream sensor data by applying a statistical analysis algorithm based on the history information;
Hadoop-based sensor stream data processing method comprising the step of combining the generated map reduce modeling with real-time stream sensor data to the HDFS unit.

The method of claim 7, wherein the server node,
It controls the overall operation in the server node, calculates the collected stream sensor data size based on the number of sensor nodes, the preset period of the stream sensor data, the stream sensor data size, and the number of sensor hubs in the network, and calculates the calculation result. Compared with the stream sensor data and outputs the same to the predefined stream sensor data to the data processing layer unit,
And if not equal, control to perform a comparison operation with the predefined stream sensor data within a preset stream sensor data collection waiting time.