KR101645396B1

KR101645396B1 - Method of processing time-series big data and system thereof

Info

Publication number: KR101645396B1
Application number: KR1020150020464A
Authority: KR
Inventors: 최은미; 임유진
Original assignee: 국민대학교 산학협력단
Priority date: 2015-02-10
Filing date: 2015-02-10
Publication date: 2016-08-03

Abstract

Provided are a method and a system for processing big data which can quickly extract and provide various levels of information scattered in time-series big data to a manager or a user. The big data processing method comprises the following steps: generating event data of a lower layer composed of a key and a value related to an obtaining data of raw data by using the raw data of big data; generating lists having values aligned in a time order on each key by sorting the event data of the lower layer based on the key; and generating event data of an upper layer by using the lists.

Description

METHOD OF PROCESSING TIME-SERIES BIG DATA AND SYSTEM THEREOF BACKGROUND OF THE INVENTION [0001]

본 발명은 빅데이터 처리 방법과 그 시스템에 관한 것으로서, 보다 상세하게는 시계열 빅데이터에 산재되어 있는 다양한 수준의 유의미한 정보를 추출 및 분석할 수 있는 빅데이터 처리 방법과 그 시스템에 관한 것이다.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a big data processing method and system, and more particularly, to a big data processing method and system capable of extracting and analyzing meaningful information at various levels scattered in time series big data.

빅데이터는 일반적인 데이터베이스 시스템으로는 수집, 저장 및 분석하기 어려운 방대한 양의 데이터를 의미한다. 빅데이터의 예로는, 소셜 미디어, 소셜 네트워크 서비스에서 발생하는 소셜 데이터, 네트워크 트래픽 로그 및 웹 서버나 응용 프로그램의 웹 로그, 센서 장비에 의해 획득되는 센싱 데이터와 로그, 차량 운행 데이터(vehicle driving data), 기상 데이터 등을 들 수 있다.Big data is a vast amount of data that is difficult to collect, store, and analyze with a typical database system. Examples of big data include social data generated from social network services, network traffic logs, web logs of web servers and application programs, sensing data and logs obtained by sensor equipment, vehicle driving data, , Weather data, and the like.

상기 센싱 데이터, 차량 운행 데이터, 기상 데이터 등과 같은 시계열 데이터(time-series data)를 포함하는 빅데이터는, 시간이 지날수록 다루어야 할 데이터의 크기가 폭발적으로 증가하게 되고 시간의 흐름에 따라 여러 패턴이 형성및 반복되어 다양한 수준의 유의미한 정보들을 내재하게 된다.Big data including time-series data such as sensing data, vehicle driving data, weather data, etc., explosively increase the size of data to be handled over time, and various patterns Formation, and repetition of various levels of meaningful information.

따라서, 이와 같이 방대한 양의 시계열 빅데이터로부터 다양한 수준의 정보들을 신속하게 추출하여 관리자 또는 사용자에게 제공할 수 있는 빅데이터 처리 방법 및 시스템의 개발이 필요하다.
Therefore, it is necessary to develop a big data processing method and system capable of rapidly extracting various levels of information from the vast amount of time series big data and providing it to the manager or the user.

본 발명의 기술적 사상이 이루고자 하는 기술적 과제는, 시계열 빅데이터에 산재된 다양한 수준의 정보들을 신속하게 추출하여 관리자 또는 사용자에게 제공할 수 있는 빅데이터 처리 방법 및 그 시스템을 제공하는 것이다.
SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a large data processing method and system capable of rapidly extracting various levels of information scattered in time series big data and providing the same to an administrator or a user.

본 발명의 기술적 사상에 의한 일 양태에 따른 빅데이터 처리 방법은, 빅데이터의 로우(raw) 데이터들을 이용하여 상기 로우 데이터들의 획득 일시와 연관된 키(key) 및 값(value)으로 구성되는 하위 계층의 이벤트 데이터들을 생성하는 단계; 상기 키를 기준으로 상기 하위 계층의 이벤트 데이터들을 소팅(sorting)하여, 상기 키들 각각에 대해 획득 일시 순서대로 정렬된 값들을 갖는 리스트들을 생성하는 단계; 및 상기 리스트들을 이용하여 상위 계층의 이벤트 데이터들을 생성하는 단계;를 포함한다.
A big data processing method according to an aspect of the present invention is a method for processing a big data by using raw data of big data and generating a key data and a value associated with an acquisition date and time of the raw data, Generating event data of the first data stream; Sorting the event data of the lower layer based on the key and generating lists having values arranged in order of acquisition date and time for each of the keys; And generating event data of an upper layer using the lists.

본 발명의 기술적 사상에 의한 다른 양태에 따른 빅데이터 처리 시스템은, 시계열 빅데이터를 처리 및 분석하는 시스템으로, 상기 시계열 빅데이터를 저장하는 빅데이터 저장부; 및 상기 시계열 빅데이터의 로우 데이터들을 이용하여 상기 로우 데이터들이 나타내는 의미에 관한 하위 계층의 이벤트 데이터들을 생성하고, 상기 하위 계층의 이벤트 데이터들을 이용하여 상기 하위 계층의 이벤트 데이터들이 나타내는 의미에 관한 상위 계층의 이벤트 데이터들을 생성하는 이벤트 데이터 생성부;를 포함한다.
According to another aspect of the present invention, there is provided a big data processing system for processing and analyzing time series big data, comprising: a big data storage unit storing the time series big data; And generating event data of a lower layer related to the meaning indicated by the row data using the row data of the time series big data, and using the event data of the lower layer, And an event data generator for generating event data of the event data.

본 발명의 기술적 사상에 의한 빅데이터 처리 방법 및 그 시스템은, 시계열 빅데이터를 구성하는 로우 데이터들 각각에 내재된 기초적 정보, 추출된 기초적 정보들에 내재된 상위 정보, 나아가 상위 정보들에 내재된 그 이상 수준의 상위 정보들과 관련된 이벤트 데이터들 각각을 계층적으로 신속하게 추출할 수 있고, 추출된 이벤트 데이터들 또는 이벤트 데이터들을 분석한 결과정보를 관리자 또는 사용자에게 제공하여 관리자 또는 사용자의 효율적인 빅데이터 이용, 정확한 의사결정을 가능하게 할 수 있다.
The method and system for processing big data according to the technical idea of the present invention is characterized in that basic information contained in each row data constituting time series big data, upper information inherent in extracted basic information, It is possible to hierarchically and quickly extract each of the event data related to the upper level information of the higher level and analyze the extracted event data or event data to provide information to the manager or the user, Data usage, and accurate decision making.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 본 발명의 기술적 사상에 의한 일 실시예에 따른 빅데이터 처리 시스템을 설명하기 위한 도면이다.
도 2는 본 발명의 기술적 사상에 의한 일 실시예에 따른 빅데이터 처리 시스템의 이벤트 데이터 생성부에서 수행되는 빅데이터 처리 방법을 설명하기 위한 순서도이다.
도 3 내지 도 6은 도 2의 빅데이터 처리 방법의 각 단계들을 보다 상세히 설명하기 위한 순서도들이며, 도 7 및 도 8은 도 2의 빅데이터 처리 방법의 각 단계들에서 일부 데이터들의 상태를 예시한 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS A brief description of each drawing is provided to more fully understand the drawings recited in the description of the invention.
FIG. 1 is a diagram for explaining a big data processing system according to an embodiment of the present invention.
2 is a flowchart illustrating a method of processing big data performed in an event data generating unit of a big data processing system according to an embodiment of the present invention.
FIGS. 3 to 6 are flowcharts for explaining each step of the big data processing method of FIG. 2 in more detail; FIGS. 7 and 8 are diagrams illustrating states of some data in each step of the big data processing method of FIG. FIG.

이하, 첨부 도면을 참조하여 본 발명의 실시예들을 상세히 설명한다. 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고, 이들에 대한 중복된 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The same reference numerals are used for the same constituent elements in the drawings, and a duplicate description thereof will be omitted.

본 발명의 실시예들은 당해 기술 분야에서 통상의 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위하여 제공되는 것으로, 아래의 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 아래의 실시예들로 한정되는 것은 아니다. 오히려, 이들 실시예는 본 개시를 더욱 충실하고 완전하게 하며 당업자에게 본 발명의 사상을 완전하게 전달하기 위하여 제공되는 것이다.Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. The present invention is not limited to the following embodiments. Rather, these embodiments are provided so that this disclosure will be more thorough and complete, and will fully convey the concept of the invention to those skilled in the art.

본 발명의 실시예들을 설명함에 있어서, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다. 또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, numerals (e.g., first, second, etc.) used in the description of the present invention are merely an identifier for distinguishing one component from another. Also, in this specification, when an element is referred to as being "connected" or "connected" with another element, the element may be directly connected or directly connected to the other element, It should be understood that, unless an opposite description is present, it may be connected or connected via another element in the middle.

그리고, 본 명세서에서 사용되는 용어(terminology)들은 본 발명의 실시예들을 적절히 표현하기 위해 사용된 용어들로서, 이는 사용자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.
The terminologies used herein are terms used to properly represent embodiments of the present invention, which may vary depending on the user, intent of the operator, or custom in the field to which the present invention belongs. Therefore, the definitions of these terms should be based on the contents throughout this specification. Like reference symbols in the drawings denote like elements.

도 1은 본 발명의 기술적 사상에 의한 일 실시예에 따른 빅데이터 처리 시스템을 설명하기 위한 도면이다.FIG. 1 is a diagram for explaining a big data processing system according to an embodiment of the present invention.

도 1을 참조하면, 빅데이터 처리 시스템(100)은 빅데이터 저장부(110), 이벤트 데이터 생성부(130) 및 분석부(150)를 포함할 수 있다. 도 1에 도시되지는 않았으나, 빅데이터 처리 시스템(100)은 빅데이터 저장부(100)의 전단에서 입력된 시계열 데이터들을 빅데이터 저장부(100)에 적합하도록 처리하는 전처리부, 및 분석부(150) 후단에서 처리 및 분석된 데이터들을 관리자 또는 사용자의 이용에 적합하도록 처리하는 후처리부를 더 포함할 수 있다. 이하에서는 설명의 편의를 위해 상기 전처리부 및 후처리부 대한 상세한 설명은 생략한다. Referring to FIG. 1, the big data processing system 100 may include a big data storage unit 110, an event data generation unit 130, and an analysis unit 150. Although not shown in FIG. 1, the big data processing system 100 includes a preprocessor for processing the time series data input from the previous stage of the big data storage 100 to fit the big data storage 100, 150), and a post-processing unit for processing data processed and analyzed at a later stage to be suitable for use by an administrator or a user. Hereinafter, the detailed description of the pre-processing unit and the post-processing unit will be omitted for convenience of explanation.

빅데이터 저장부(100)는, 입력된 시계열 데이터들을 저장할 수 있다. 상기 시계열 데이터들 각각은 해당 데이터가 획득된 시간, 일자 등에 대한 획득 일시 필드, 해당 데이터와 연관된 기기, 기기의 사용자 등을 식별하기 위한 식별자 필드, 해당 일시에 획득된 수치 정보에 대한 수치 필드 등을 포함할 수 있다. 상기 시계열 데이터들은 각각이 획득된 순간에서의 1차적인 의미를 내재할 수 있고, 획득 주기보다 장기적인 관점에서 상기 1차적인 의미보다 상위 계층, 즉 더 높은 수준의 의미들을 내재할 수 있다.The big data storage unit 100 may store the input time series data. Each of the time series data includes an acquisition date and time field for the time, date and the like when the data is acquired, an identifier field for identifying the device associated with the data, a user of the device, and the like, . The time series data may have a primary meaning at each acquired instant, and may have a higher level, that is, a higher level, than the primary meaning in terms of a longer term than the acquisition period.

일부 실시예에서, 상기 시계열 데이터는 소정의 주기로 획득되는 차량 운행 데이터일 수 있다. 상기 차량 운행 데이터는, VDR(Vehicle Driving Recorder), 센서, 블랙박스 등과 같이 차량에 장착되는 장비를 통해 직접적으로 획득되는 속도, 가속도, RPM 등의 차량 운행 관련 정보를 포함할 수 있다. 또는 상기 차량 운행 데이터는, 차량에 탑승한 탑승자가 소지한 사용자 단말(예를 들어, 스마트폰 등)을 통해 간접적으로 획득되는 차량 운행 관련 정보(예를 들어, 스마트폰에 탑재된 GPS 센서, 가속도 센서 등을 통해 획득 가능한 위치, 속도, 가속도 등)을 포함할 수도 있다. 여기서, 차량 운행 데이터는 획득 주기마다 운행자의 과속 여부, 급가속 여부 등을 나타내는 1차적인 의미를 내재할 수 있으며, 획득 주기보다 장기적인 관점에서 연속 과속 여부, 운전 습관/성향 등을 나타내는 상위 계층의 의미들을 내재할 수 있다.In some embodiments, the time series data may be vehicle driving data obtained at a predetermined cycle. The vehicle driving data may include vehicle driving related information such as speed, acceleration, and RPM directly obtained through equipment mounted on the vehicle such as a VDR (Vehicle Driving Recorder), a sensor, and a black box. Alternatively, the vehicle operation data may include vehicle operation related information (for example, a GPS sensor mounted on the smartphone, acceleration (acceleration), etc.) indirectly obtained through a user terminal (e.g., smart phone or the like) Position, velocity, acceleration, etc., obtainable via sensors, etc.). Here, the vehicle driving data may have a primary meaning indicating whether the driver is overspeed or rapidly accelerated at each acquisition period, and the vehicle speed data may have a primary meaning indicating whether the driver is overspeed, driving habit / Meanings can be inherent.

다른 실시예에서, 상기 시계열 데이터는, 특정 지역이나 공간상에서 주변 환경 및/또는 사물 인식 정보를 수집하기 위한 센서 네트워크의 각 센서 노드로부터 소정의 주기로 획득되는 센싱 데이터일 수 있다. 상기 센서 네트워크는, 빌딩, 주차장, 대형마켓, 공장, 발전소 등을 관리 및 운영하기 위한 네트워크일 수 있으며, 상기 센싱 데이터는 대응하는 센서에서 획득된 소정의 정보를 포함할 수 있다. 여기서, 상기 센서 네트워크가 발전소의 관리 및 운영을 위해 구현된 네트워크이고, 상기 센싱 데이터가 발전 설비의 상태를 감지하기 위한 센서로부터 획득된 데이터인 경우, 상기 센싱 데이터는 획득 주기 별로 발전 설비의 동작 상태를 나타내는 1차적인 의미를 내재할 수 있으며, 획득 주기보다 장기적인 관점에서 발전 설비의 고장 여부 등을 나타내는 상위 계층의 의미들을 내재할 수 있다.In another embodiment, the time series data may be sensed data acquired from each sensor node of the sensor network in a predetermined period to collect surrounding information and / or object recognition information in a specific region or space. The sensor network may be a network for managing and operating a building, a parking lot, a large market, a factory, a power plant, etc., and the sensing data may include predetermined information obtained from a corresponding sensor. Here, if the sensor network is a network implemented for management and operation of a power plant, and the sensing data is data obtained from a sensor for sensing the state of the power generation facility, the sensing data may include an operation state , And it can include the meaning of the upper layer indicating whether the power generation facility is failed or not from the viewpoint of the long term rather than the acquisition cycle.

또 다른 실시예에서, 상기 시계열 데이터는, 강우량, 적설량, 지진 발생 횟수, 기온 변화, 미세먼지 농도 등과 관련된 기상 연관 데이터, 국민총생산액, 물가지수, 총 수출액 등의 경제활동과 관련된 이코노믹 연관 데이터, 특정 기업체의 상품 판매량, 상품 광고액 등과 관련된 마케팅 연관 데이터, 총인구, 인구증가율 등과 관련된 인구(demographic) 연관 데이터, 교통량, 교통사고 건수, 범죄 발생 수 등과 같은 소셜 연관 데이터일 수 있으며, 상술한 데이터들은 획득 주기마다의 1차적인 의미와 획득 주기보다 장기적인 관점에서 더 높은 수준의 의미를 내재할 수 있다.In another embodiment, the time-series data includes at least one of economic related data related to economic activities such as rainfall, snowfall, number of earthquake occurrences, temperature change, weather related data related to fine dust concentration, total gross production value, Related data such as demographic related data related to the sales amount of the commodity of the specific company, the marketing related data related to the product advertisement amount, the total population, the population growth rate, the traffic volume, the traffic accident number, The primary meaning for each acquisition cycle and the higher level of significance from a longer-term perspective than the acquisition cycle can be implied.

빅데이터 저장부(100)는, 저장된 시계열 데이터들을 로우(raw) 형태로 이벤트 데이터 생성부(130)에 전달할 수 있고, 이벤트 데이터 생성부(130)에서 생성된 하위, 상위, 차상위 계층의 이벤트 데이터들을 저장할 수 있다.The big data storage unit 100 may transmit the stored time series data to the event data generating unit 130 in a raw form and may store event data of the lower, Lt; / RTI >

빅데이터 저장부(110)는, 저장 수단(도시 생략) 및 관리 수단(도시 생략)을 포함하는 분산 파일 시스템(distributed file system), 예를 들어 하둡(hadoop) 분산 파일 시스템으로 구성될 수 있다. 상기 저장 수단은, 예를 들어 물리적으로 떨어져 있는 여러 개의 스토리지들로 구성될 수 있으며, 각각의 스토리지들이 입력된 시계열 데이터들, 계층적인 이벤트 데이터들을 상호 공유 및 저장할 수 있다. 상기 관리 수단은 상기 저장 수단의 상기 시계열 데이터 저장, 상기 저장 수단으로부터 이벤트 데이터 생성부(130)로의 상기 시계열 데이터의 출력 및 이벤트 데이터 생성부(130)로부터 상기 저장 수단으로의 계층적인 이벤트 데이터들의 입력 등을 제어할 수 있다. 그러나 본 발명의 기술적 사상이 이에 한정되는 것은 아니며, 빅데이터 저장부(110)는 빅데이터를 저장하기 위한 클라우드 파일 저장 시스템, 네트워크 저장 시스템 등으로 구성될 수도 있다.
The big data storage unit 110 may be configured with a distributed file system including a storage unit (not shown) and a management unit (not shown), for example, a Hadoop distribution file system. For example, the storage means may comprise a plurality of physically separated storages, and each of the storages may share and store input time series data and hierarchical event data. The management means may store the time series data in the storage means, output the time series data from the storage means to the event data generation unit 130, and input hierarchical event data from the event data generation unit 130 to the storage means. And so on. However, the technical idea of the present invention is not limited thereto, and the big data storage unit 110 may include a cloud file storage system for storing big data, a network storage system, and the like.

이벤트 데이터 생성부(130)는, 빅데이터 저장부(110)로부터 시계열 데이터들을 로우 형태(이하 로우 데이터라 칭함)로 입력 받을 수 있다. 이벤트 데이터 생성부(130)는, 빅데이터에 내재된 다양한 수준의 의미들을 추출 및 분석하기 위해, 하위 계층의 데이터로부터 상위 계층의 데이터까지 다양한 계층의 이벤트 데이터들을 생성할 수 있다.The event data generation unit 130 may receive time series data from the big data storage unit 110 in a low format (hereinafter, referred to as low data). The event data generation unit 130 may generate event data in various layers from lower layer data to higher layer data in order to extract and analyze various levels of semantics contained in the big data.

상세하게는, 이벤트 데이터 생성부(130)는 로우 데이터들을 이용하여 로우 데이터들 각각이 나타내는 의미에 관한 하위 계층의 이벤트 데이터들을 생성할 수 있고, 하위 계층의 이벤트 데이터들을 이용하여 하위 계층의 이벤트 데이터들이 나타내는 의미에 관한 상위 계층의 이벤트 데이터들을 생성할 수 있다. 또한, 이벤트 데이터 생성부(130)는 생성된 상위 계층의 이벤트 데이터들을 기초로 차상위 계층의 이벤트 데이터들을 생성할 수 있다. 이와 같이, 이벤트 데이터 생성부(130)는 기 생성된 낮은 수준의 이벤트 데이터들을 기초로 더 높은 수준의 이벤트 데이터들을 계속적으로 생성할 수 있다.In more detail, the event data generator 130 may generate low-level event data related to the meaning indicated by each of the row data using the row data, and may use event data of a lower layer to generate event data The event data of the upper layer related to the meaning represented by the event data. In addition, the event data generating unit 130 may generate event data of the next higher layer based on the generated event data of the upper layer. In this manner, the event data generating unit 130 can continuously generate higher-level event data based on the generated low-level event data.

이벤트 데이터 생성부(130)는, 계층적으로 이벤트 데이터들을 생성함에 있어서, 예를 들어 맵-리듀스 메커니즘을 이용할 수 있다.The event data generation unit 130 may use, for example, a map-reduction mechanism in hierarchically generating event data.

이벤트 데이터 생성부(130)의 계층적인 이벤트 데이터 생성 프로세스에 대해서는 이하에서 도 2 내지 도 8을 참조하여 더 상세히 설명하기로 한다.
The hierarchical event data generation process of the event data generation unit 130 will be described in more detail with reference to FIG. 2 to FIG. 8 below.

분석부(150)는, 빅데이터 저장부(110)에 저장된 상기 로우 데이터, 상기 계층 이벤트 데이터들 중 적어도 하나를 이용하여 상기 빅데이터가 나타내는 의미에 관한 분석결과정보를 생성할 수 있다. 분석부(150)는, 다양한 데이터 마이닝 분석 기법을 통해 상기 로우 데이터, 상기 계층 이벤트 데이터들 중 적어도 하나를 분석하여 상기 분석결과정보를 생성할 수 있다.The analysis unit 150 may generate analysis result information on the meaning indicated by the big data using at least one of the row data and the layer event data stored in the big data storage unit 110. [ The analyzer 150 may analyze at least one of the row data and the layer event data through various data mining analysis techniques to generate the analysis result information.

분석부(150)는, 상기 분석결과정보를 관리자 또는 사용자에게 전달하여 관리자 또는 사용자의 빅데이터 이용, 의사 결정을 도울 수 있다. 한편, 도 1에 도시되지는 않았으나, 구현예에 따라서 분석부(150)는 상기 로우 데이터, 상기 계층 이벤트 데이터들 자체를 관리자 또는 사용자에게 전달하여 관리자 또는 사용자의 빅데이터 이용, 의사 결정을 도울 수도 있다.
The analysis unit 150 may transmit the analysis result information to an administrator or a user to assist the administrator or the user in using big data and making a decision. Although not shown in FIG. 1, according to an embodiment, the analyzer 150 may transmit the raw data and the layer event data themselves to an administrator or a user to assist the administrator or the user in using the big data, have.

도 2는 본 발명의 기술적 사상에 의한 일 실시예에 따른 빅데이터 처리 시스템의 이벤트 데이터 생성부에서 수행되는 빅데이터 처리 방법을 설명하기 위한 순서도이고, 도 3 내지 도 6은 도 2의 빅데이터 처리 방법의 각 단계들을 보다 상세히 설명하기 위한 순서도들이며, 도 7 및 도 8은 도 2의 빅데이터 처리 방법의 각 단계들에서 일부 데이터들의 상태를 예시한 도면이다. 도 2에 도시된 빅데이터 처리 방법의 각 단계들은 해당 알고리즘을 구현하는 적어도 하나의 소프트웨어를 통해 이벤트 데이터 생성부(130)에서 수행될 수 있다. 이하에서는, 도 2에 도시된 빅데이터 처리 방법의 각 단계들을 설명함에 있어서, 도 3 내지 도 8을 함께 참조하여 설명한다.
FIG. 2 is a flow chart for explaining a big data processing method performed in an event data generation unit of a big data processing system according to an embodiment of the present invention; FIGS. 3 to 6 are flowcharts And FIGS. 7 and 8 are diagrams illustrating states of some data in each step of the method of processing large data of FIG. 2. Referring to FIG. Each step of the big data processing method shown in FIG. 2 can be performed in the event data generation unit 130 through at least one software that implements the corresponding algorithm. Hereinafter, the steps of the big data processing method shown in FIG. 2 will be described with reference to FIGS. 3 to 8. FIG.

하위 계층의 이벤트 데이터 생성 프로세스(Event data generation process of lower layer ( S210S210 ))

도 2를 참조하면, 이벤트 데이터 생성부(130)는, 입력되는 로우 데이터들을 이용하여 상기 로우 데이터들의 획득 일시와 연관된 키 및 값으로 구성되는 하위 계층의 이벤트 데이터들을 생성할 수 있다(S210). 상기 로우 데이터들 중 어느 하나의 로우 데이터에 대한 하위 계층의 이벤트 데이터 생성 예를 도시하는 도 3을 더 참조하여 단계 S210을 더 상세히 설명한다.
Referring to FIG. 2, the event data generating unit 130 may generate event data of a lower layer composed of keys and values associated with the acquisition date and time of the row data using the input row data (S210). Step S210 will be described in further detail with reference to FIG. 3, which shows an example of generation of lower layer event data for any one of the row data.

먼저 이벤트 데이터 생성부(130)는 상기 로우 데이터를 필드 별로 파싱할 수 있다(S310). 즉, 이벤트 데이터 생성부(130)는 상기 로우 데이터를 획득 일시 필드, 식별자 필드 및 수치 필드 별로 파싱할 수 있다. First, the event data generation unit 130 may parse the row data by field (S310). That is, the event data generation unit 130 may parse the row data by the acquisition date / time field, the identifier field, and the numeric field.

여기서, 상기 획득 일시 필드는 상기 로우 데이터가 획득된 시간, 일자 등에 대한 필드일 수 있다. 상기 식별자 필드는 상기 로우 데이터에 대응하는 기기, 사용자 등을 식별하기 위한 필드일 수 있다. 상기 수치 필드는, 예를 들어 상기 로우 데이터가 차량 운행 데이터인 경우, 차량의 속도, 가속도 등에 대한 필드일 수 있다. 다른 예를 들면, 상기 로우 데이터가 발전 설비와 관련된 센싱 데이터인 경우, 상기 수치 필드는 상기 발전 설비의 발전량 등에 대한 필드일 수 있다. 또 다른 예를 들면, 상기 로우 데이터가 기상 데이터인 경우, 상기 수치 필드는 강우량 등에 대한 필드일 수 있다.Here, the acquisition date / time field may be a field for the time, date, etc., at which the row data is acquired. The identifier field may be a field for identifying a device, a user, or the like corresponding to the row data. The numerical field may be, for example, a field for a vehicle speed, an acceleration, etc. when the low data is vehicle driving data. In another example, if the raw data is sensing data related to the power generation facility, the numerical field may be a field for the power generation amount of the power generation facility. As another example, when the raw data is weather data, the numerical field may be a field for rainfall or the like.

이벤트 데이터 생성부(130)는 상기 로우 데이터의 수치 필드에 레코드가 존재하는지 여부를 판단할 수 있다(S320). 이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드에 레코드가 존재하면 후술되는 단계들을 수행하고, 상기 로우 데이터의 수치 필드에 레코드가 존재하지 않으면 하위 계층의 이벤트 데이터를 생성하지 않고 프로세스를 종료할 수 있다.The event data generation unit 130 may determine whether a record exists in the numeric field of the row data (S320). When there is a record in the numeric field of the row data, the event data generation unit 130 performs the following steps, and if there is no record in the numeric field of the row data, the event data generation unit 130 generates a process Can be terminated.

이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드에 레코드가 존재하면, 수치 필드의 레코드가 미리 설정된 제1 조건에 부합하는지 여부를 판단할 수 있다(S330).If there is a record in the numeric field of the row data, the event data generation unit 130 may determine whether the record of the numeric field meets a predetermined first condition (S330).

상기 제1 조건은, 의미 추출을 위한 판단 기준으로서 사용자에 의해 설정될 수 있다. 상기 제1 조건은, 예를 들어, 판단 방법에 따라 상기 수치 필드의 레코드가 소정의 임계치보다 큰 지 여부, 소정의 임계치보다 작은지 여부, 소정의 임계 범위 내인지 여부 등으로 다양하게 설정될 수 있다. The first condition may be set by a user as a judgment criterion for extracting a meaning. The first condition may be variously set according to, for example, a determination method, whether the record of the numeric field is larger than a predetermined threshold, whether it is smaller than a predetermined threshold, whether it is within a predetermined threshold range have.

상기 로우 데이터가 차량 운행 데이터이며 수치 필드가 속도에 대한 필드인 경우를 예로 들면, 상기 제1 조건은 과속인지 여부를 판단하기 위한 기준으로서 상기 수치 필드의 레코드가 일정 속도를 초과하는 것인지 여부로 설정될 수 있다.For example, when the row data is vehicle driving data and the numeric field is a speed field, the first condition is set as a criterion for determining whether the speed is overspeed or not, .

상기 로우 데이터가 발전 설비와 관련된 센싱 데이터이며 수치 필드가 발전량에 대한 필드인 경우를 예로 들면, 상기 제1 조건은 비정상 동작 여부를 판단하기 위한 기준으로서 상기 수치 필드의 레코드가 정상 동작 시의 발전량 오차 범위를 벗어나는지 여부로 설정될 수 있다. For example, when the row data is sensing data related to the power generation facility and the numerical field is a field for the power generation amount, the first condition is a criterion for determining whether the abnormal operation is abnormal, Range can be set.

상기 로우 데이터가 기상 데이터이며 수치 필드가 강우량에 대한 필드인 경우를 예로 들면, 상기 제1 조건은 호우 주의보 발령 등에 요구되는 위험 강우량인지 여부를 판정하기 위한 기준으로서 일정 강우량을 초과하는 것인지 여부로 설정될 수 있다.For example, when the row data is weather data and the numerical field is a field for rainfall, the first condition is set as a criterion for determining whether or not the rainfall amount is the amount of dangerous rainfall required for the announcement of a heavy rainfall warning, .

이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드 레코드가 상기 제1 조건에 부합하면, 상기 하위 계층의 이벤트 데이터를 생성할 수 있다(S340). 이 때, 이벤트 데이터 생성부(130)는, 하위 계층의 이벤트 종류와 관련된 하위 계층 이벤트 네임 필드를 생성할 수 있으며, '하위 계층 이벤트 네임, 획득 일시, 식별자' 필드의 레코드로 키를 생성하고 '획득 일시'의 레코드로 값을 생성하여 상기 하위 계층의 이벤트 데이터를 생성할 수 있다(도 7의 LD1 내지 LDw 참조). 또는, 이벤트 데이터 생성부(130)는, '획득 일시, 식별자'의 레코드로 키를 생성하고 '획득 일시, 하위 계층 이벤트 네임'의 레코드로 값을 생성하여 상기 하위 계층의 이벤트 데이터를 생성할 수 있다(도 8의 LD1 내지 LDw 참조).The event data generation unit 130 may generate event data of the lower layer if the numeric field record of the row data satisfies the first condition (S340). At this time, the event data generating unit 130 can generate a lower layer event name field related to an event type of a lower layer, generate a key using a record of the 'lower layer event name, acquisition date and time, identifier' field, (See LD1 to LDw in FIG. 7). [0051] In this case, as shown in FIG. Alternatively, the event data generation unit 130 may generate a key with a record of 'date and time of acquisition, identifier' and generate a value with a record of 'acquisition date and time, lower hierarchy event name' to generate event data of the lower hierarchy (See LD1 to LDw in Fig. 8).

예를 들어, 상기 제1 조건이 차량의 과속을 판단하기 위한 기준으로 설정된 경우, 이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드 레코드가 규정 속도 초과에 해당하면, 하위 계층 이벤트 네임 필드의 레코드로 '과속'을 생성하고 생성된 하위 계층 이벤트 네임 필드의 레코드를 키 또는 값에 포함시켜 하위 계층의 이벤트 데이터를 생성할 수 있다.For example, if the first condition is set as a criterion for determining overspeed of the vehicle, the event data generation unit 130 may generate a lower layer event name field The event data of the lower layer can be generated by generating an 'overspeed' as a record of the lower layer hierarchy and including a record of the generated lower hierarchy event name field in the key or value.

다른 예를 들어, 상기 제1 조건이 발전 설비의 비정상 동작 여부를 판단하기 위한 기준으로 설정된 경우, 이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드 레코드가 정상 동작 시의 발전량 오차 범위를 벗어나면, 하위 계층 이벤트 네임 필드의 레코드로 '비정상'을 생성하고 생성된 하위 계층 이벤트 네임 필드의 레코드를 키 또는 값에 포함시켜 하위 계층의 이벤트 데이터를 생성할 수 있다.For example, if the first condition is set as a criterion for determining whether the power plant is abnormal, the event data generation unit 130 may generate a power generation error error range for normal operation The event data of the lower layer can be generated by generating an 'abnormal' as a record of the lower layer event name field and including a record of the generated lower layer event name field in the key or value.

또 다른 예를 들어, 상기 제1 조건이 위험 강우량인지 여부를 판단하기 위한 기준으로 설정된 경우, 이벤트 데이터 생성부(130)는, 상기 로우 데이터의 수치 필드 레코드가 일정 강우량을 초과하면, 하위 계층 이벤트 네임 필드의 레코드로 '위험 강우량'을 생성하고 생성된 하위 계층 이벤트 네임 필드의 레코드를 키 또는 값에 포함시켜 하위 계층의 이벤트 데이터를 생성할 수 있다.
For example, if the first condition is set as a criterion for determining whether the first condition is a critical amount of rainfall, the event data generation unit 130 may generate a lower layer event when the numerical field record of the raw data exceeds a predetermined amount of rainfall, The event data of the lower layer can be generated by generating a 'dangerous rainfall amount' as a record of the name field and including a record of the generated lower layer event name field in the key or value.

이벤트 데이터 생성부(130)는 상술한 단계 S310 내지 S340를 로우 데이터들 각각에 대해 수행하여 복수의 하위 계층의 이벤트 데이터들을 생성할 수 있다.
The event data generation unit 130 may generate the plurality of lower layer event data by performing the above-described steps S310 to S340 for each of the row data.

리스트 생성 프로세스(List creation process ( S220S220 ))

도 2를 참조하면, 이벤트 데이터 생성부(130)는 키를 기준으로 하위 계층의 이벤트 데이터들을 소팅하여, 키들 각각에 대해 시간순으로 정렬된 값들을 갖는 리스트들을 생성할 수 있다(S230).Referring to FIG. 2, the event data generator 130 may sort the event data of the lower layer based on the key to generate lists having values sorted in chronological order for each of the keys (S230).

도 4를 더 참조하면, 이벤트 데이터 생성부(130)는, 필드 레코드들 각각을 부분 키로 갖는 키를 기준으로 하위 계층의 이벤트 데이터들을 소팅 시, 먼저 획득 일시 필드의 레코드 이외의 레코드들에 대한 부분 키를 기준으로 하위 계층의 이벤트 데이터들을 소팅하여, 상기 부분 키들 각각에 대해 값들을 정렬시킬 수 있다(S410). 이어서, 이벤트 데이터 생성부(130)는, 획득 일시 필드의 레코드에 대한 부분 키를 기준으로 정렬된 값들을 시간순으로 재정렬하여, 상기 리스트들을 생성할 수 있다(S410).Referring to FIG. 4, when sorting the lower-level event data based on a key having each of the field records as a partial key, the event data generating unit 130 first extracts a portion The event data of the lower layer may be sorted based on the key, and the values may be sorted for each of the partial keys (S410). In operation S410, the event data generator 130 generates the lists by rearranging the sorted values based on the partial keys of the records of the acquisition date and time field in chronological order.

예를 들어, 하위 계층의 이벤트 데이터들이 '획득일시, 식별자, 하위 계층 이벤트 네임' 필드의 레코드들 각각을 부분 키로 갖는 경우(도 7의 LD1 내지 LDw 참조), 이벤트 데이터 생성부(130)는 먼저 '식별자'와 '하위 계층 이벤트 네임' 필드의 레코드에 관한 부분 키들 각각을 기준으로 값들을 정렬시키고, 이어서 '획득 일시' 필드의 레코드를 기준으로 정렬된 값들을 시간순으로 재정렬하여 상기 리스트들을 생성할 수 있다(도 7의 LT1 내지 LTx). 이와 같이 '하위 계층 이벤트 네임' 필드의 레코드가 부분 키로 포함되는 경우에는, 후술되는 상위 계층의 이벤트 데이터 생성 프로세스(S230)을 통해 한 종류의 하위 계층 이벤트와 연관된 상위 계층의 이벤트가 추출될 수 있다. 한편, '식별자'와 '하위 계층 이벤트 네임' 필드의 레코드와 관련된 부분 키들을 이용하여 값들을 정렬 시, 어느 부분 키를 먼저 기준으로 하여 값들을 정렬시키더라도 문제되지 않는다.For example, when the event data of the lower layer has each of the records of the 'acquisition date and time, identifier, lower layer event name' field as partial keys (see LD1 to LDw in FIG. 7), the event data generating unit 130 The values are sorted on the basis of partial keys related to the records of the 'identifier' and the 'lower layer event name' fields, and then the lists are sorted by chronologically rearranging the values sorted on the basis of the record of the 'date and time of acquisition' field (LT1 to LTx in Fig. 7). If a record of the 'lower layer event name' field is included as a partial key, an event of a higher layer related to one kind of lower layer event may be extracted through an event data generating process of an upper layer (S230) . On the other hand, when sorting the values using the partial keys related to the records of the 'identifier' and the 'lower layer event name' fields, it is not a problem to arrange the values based on which partial key is used first.

다른 예를 들어, 하위 계층의 이벤트 데이터들이 '획득일시, 식별자' 필드의 레코드들 각각을 부분 키로 갖는 경우(도 8의 LD1 내지 LDw 참조), 이벤트 데이터 생성부(130)는 먼저 '식별자' 필드의 레코드를 기준으로 값들을 정렬시키고, 이어서 '획득 일시' 필드의 레코드를 기준으로 정렬된 값들을 시간순으로 재정렬하여 상기 리스트들을 생성할 수 있다(도 8의 LT1 내지 LTx). 이와 같이 '하위 계층 이벤트 네임' 필드의 레코드가 부분 키로 포함되지 않고 값에 포함되는 경우에는, 후술되는 상위 계층의 이벤트 데이터 생성 프로세스(S230)을 통해 여러 종류의 하위 계층 이벤트들과 연관된 상위 계층의 이벤트 데이터가 추출될 수 있다.
For example, when the event data of the lower layer has each of the records of the 'acquisition date and time, identifier' field as a partial key (see LD1 to LDw in FIG. 8), the event data generation unit 130 first generates the ' (LT1 to LTx of FIG. 8) by rearranging the values based on the record of the " date and time of acquisition " If the record of the 'lower layer event name' field is not included in the partial key but is included in the value, the event data generation process of the upper layer (S230) Event data can be extracted.

상위 계층의 이벤트 데이터 생성 프로세스(Event data generation process of upper layer ( S230S230 ))

도 2, 도 5 및 도 6을 참조하면, 이벤트 데이터 생성부(130)는 생성된 리스트들을 이용하여 상위 계층의 이벤트 데이터들을 생성할 수 있다(S230). 여기서 도 5 및 도 6 각각은 도 2에 도시된 상위 계층의 이벤트 데이터 생성 프로세스의 일 구현예로 어느 하나의 리스트에 대해 수행되는 단계들을 나타낸다. 이벤트 데이터 생성부(130)는, 도 5 및 도 6에 도시된 구현예들 중 어느 하나의 구현예만을 복수의 리스트들에 대해 병렬적으로 수행할 수 있으나, 본 발명의 기술적 사상이 이에 한정되는 것은 아니다. 이벤트 데이터 생성부(130)는 도 5 및 도 6에 도시된 구현예들 중 어느 하나는 일부 리스트들에 대해 병렬적으로 수행하고 다른 하나는 다른 리스트들에 대해 병렬적으로 수행할 수도 있다.
2, 5 and 6, the event data generating unit 130 may generate event data of an upper layer using the generated lists (S230). 5 and 6 show the steps performed on any one of the lists in an embodiment of the event data generation process of the upper layer shown in FIG. The event data generation unit 130 may perform only one of the implementations shown in FIG. 5 and FIG. 6 in parallel for a plurality of lists, but the technical idea of the present invention is limited thereto It is not. The event data generation unit 130 may perform one of the implementations shown in FIG. 5 and FIG. 6 in parallel for some lists and the other for the other lists in parallel.

먼저, 도 2 및 도 5를 참조하여 상위 계층의 이벤트 데이터 생성 프로세스의 일 구현예를 설명하면, 이벤트 데이터 생성부(130)는 어느 하나의 리스트에서 n번째 값이 첫 번째 값인지 여부를 판단할 수 있다(S510). 이벤트 데이터 생성부(130)는, n번째 값이 첫 번째 값이면 단계 S540을 수행하여 상위 계층의 이벤트 데이터를 생성할 수 있고, n번째 값이 첫 번째 값이 아니면 후속 단계들을 수행할 수 있다.2 and 5, the event data generation unit 130 determines whether the n-th value is the first value in any one of the lists (S510). If the nth value is the first value, the event data generating unit 130 may generate the event data of the upper layer by performing step S540, and may perform subsequent steps if the nth value is not the first value.

이벤트 데이터 생성부(130)는 n번째 값이 첫 번째 값이 아니면 n번째 값을 필드 별로 파싱할 수 있다(S520). 한편, 단계 S520은 상기 값이 하나의 필드만을 포함하는 경우에는 생략될 수 있다. 이하에서는 상기 값이 획득 일시 필드를 포함하는 경우를 예로 들어 설명한다(도 6도 동일).If the nth value is not the first value, the event data generation unit 130 may parse the nth value by field (S520). On the other hand, the step S520 may be omitted if the value includes only one field. Hereinafter, the case where the value includes the acquisition date / time field will be described as an example (FIG. 6 is also the same).

이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 제2 조건에 부합하는지 여부를 판단할 수 있다(S530). 여기서, 상기 제2 조건은, 전술한 로우 데이터의 획득 주기에 대응하는지 여부로 설정될 수 있으며, 이벤트 데이터 생성부(130)는 하위 계층의 이벤트 데이터의 값들이 연속적으로 획득된 로우 데이터들에 대한 것인지 여부를 판단할 수 있다.The event data generation unit 130 may determine whether the record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value meets the second condition (S530). Here, the second condition may be set as to whether or not it corresponds to the above-described acquisition period of the row data, and the event data generator 130 may generate event data for the low- It can be judged whether or not it is.

이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하지 않으면, 상위 계층 이벤트 네임 필드와 상위 계층 이벤트 일시 필드를 생성하여 상위 계층의 이벤트 데이터를 생성할 수 있다(S540, 도 7 및 도 8의 HD1 내지 HDy 참조). 이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하면, 상위 계층의 이벤트 데이터를 생성하지 않고 n+1번째 값에 대해 단계 S510을 수행할 수 있다(S560).If the record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value does not match the second condition, the event data generation unit 130 generates an event data Field to generate event data of an upper layer (refer to S540, HD1 to HDy in FIGS. 7 and 8). If the record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value meets the second condition, the event data generation unit 130 generates the event data of the n + Step S510 may be performed for the first value (S560).

예를 들면, 하위 계층 이벤트 네임이 차량 운행자의 '과속'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드와 상기 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하지 않으면, 상기 레코드 차이에 상응하는 기간 동안 과속이 연속된 것으로 판단할 수 있다. 이에 따라, 이벤트 데이터 생성부(130)는, 상위 계층 이벤트 네임 필드의 레코드를 단기적인 관점에서의 '연속 과속'으로 생성할 수 있고, 획득 일시 필드 레코드 각각과 레코드 차이를 기초로 상위 계층 이벤트 일시 필드의 레코드를 생성하여 상위 계층의 이벤트 데이터를 발생시킬 수 있다.For example, when the lower layer event name is related to the 'overspeed' of the vehicle operator, the event data generation unit 130 generates the event data of the record of the record between the acquisition date and time field of the n-th value and the acquisition date and time field of the n- If the difference does not match the second condition, it can be determined that the overspeed continues for a period corresponding to the record difference. Accordingly, the event data generation unit 130 can generate the record of the upper layer event name field as 'continuous overspeed' in a short term viewpoint, and generate the upper layer event date field The event data of the upper layer can be generated.

다른 예를 들면, 하위 계층 이벤트 네임이 발전 설비의 '비정상'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드와 상기 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하지 않으면, 상기 레코드 차이에 상응하는 기간 동안을 단기적인 관점에서의 오작동으로 판단할 수 있다. 이에 따라, 이벤트 데이터 생성부(130)는, 상위 계층 이벤트 네임 필드의 레코드를 '연속 오작동'으로 생성할 수 있다.In another example, when the lower layer event name is related to the 'abnormal' of the power generation facility, the event data generation unit 130 generates the event data of the n-th value and the n-1th value If the record difference does not match the second condition, it can be determined that the period corresponding to the record difference is a malfunction in terms of a short term. Accordingly, the event data generating unit 130 can generate a record of the upper layer event name field as a 'continuous malfunction'.

또 다른 예를 들면, 하위 계층 이벤트 네임이 '위험 강우량'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드와 상기 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하지 않으면, 상기 레코드 차이에 상응하는 기간 동안 집중적인 호우가 발생된 것으로 판단할 수 있고, 상위 계층 이벤트 네임 필드의 레코드를 단기적인 관점에서의 '연속 위험 강우량'으로 생성할 수 있다. In another example, when the lower layer event name is related to the 'dangerous rainfall amount', the event data generation unit 130 generates the event data by using the record between the acquisition date and time field of the n-th value and the acquisition date and time field of the n- If the difference does not match the second condition, it can be determined that a heavy rain has occurred during a period corresponding to the record difference, and a record of the upper layer event name field is generated as a 'continuous risk rainfall amount' can do.

이벤트 데이터 생성부(130)는 어느 하나의 리스트에서 n번째 값이 마지막 값인지 여부를 판단할 수 있다(S550). 이벤트 데이터 생성부(130)는, n번째 값이 마지막 값이면 프로세스를 종료하고, n번째 값이 마지막 값이 아니면 n+1번째 값에 대해 단계 S510을 수행할 수 있다(S560).
The event data generation unit 130 may determine whether the nth value is the last value in any of the lists (S550). If the nth value is not the last value, the event data generating unit 130 may perform step S510 for the n + 1th value (S560).

다음으로, 도 2 및 도 6을 참조하여 상위 계층의 이벤트 데이터 생성 프로세스의 다른 구현예를 설명하면, 이벤트 데이터 생성부(130)는 어느 하나의 리스트에서 n번째 값이 첫 번째 값인지 여부를 판단할 수 있다(S610). 이벤트 데이터 생성부(130)는, n번째 값이 첫 번째 값이면 단계 S650을 수행하여 상위 계층의 이벤트 데이터를 생성할 수 있고, n번째 값이 첫 번째 값이 아니면 후술되는 단계들을 수행할 수 있다.Next, referring to FIG. 2 and FIG. 6, the event data generation unit 130 determines whether the nth value is the first value in any one of the lists. (S610). If the nth value is the first value, the event data generating unit 130 may generate the event data of the upper layer by performing step S650. If the nth value is not the first value, the event data generating unit 130 may perform the following steps .

이벤트 데이터 생성부(130)는, n번째 값이 첫 번째 값이 아니면 n번째 값을 필드 별로 파싱할 수 있다(S620). 한편, 단계 S620은 도 5를 참조하여 설명한 바와 같이 상기 값들이 하나의 필드만을 포함하는 경우에는 생략될 수 있다.If the n-th value is not the first value, the event data generation unit 130 may parse the n-th value by field (S620). On the other hand, the step S620 may be omitted if the values include only one field as described with reference to FIG.

이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 제2 조건에 부합하는지 여부를 판단할 수 있다(S630). 여기서, 상기 제2 조건은, 도 5를 참조하여 설명한 바와 같이 레코드 차이가 전술한 로우 데이터의 획득 주기에 상응하는지 여부로 설정될 수 있다.The event data generation unit 130 may determine whether the record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value meets the second condition (S630). Here, the second condition may be set as to whether the record difference corresponds to the acquisition period of the row data as described above with reference to Fig.

이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하지 않으면 단계 S640을 수행할 수 있고, n번째 값의 획득 일시 필드와 n-1번째 값의 획득 일시 필드 사이의 레코드 차이가 상기 제2 조건에 부합하면 상위 계층의 이벤트 데이터를 생성하지 않고 n+1번째 제1 값에 대해 단계 S610을 수행할 수 있다(S670).The event data generation unit 130 may perform step S640 if the record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value does not meet the second condition, If the record difference between the acquisition date and time field of the (n-1) th value and the acquisition date and time field of the (n-1) th data satisfies the second condition, step S610 may be performed for the (S670).

이벤트 데이터 생성부(130)는, 상기 레코드 차이가 상기 제2 조건에 부합하지 않고, n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 포함되는지 여부를 판단할 수 있다(S640). 상기 시간 범위는, 소정 시간 구간 내에서 상위 계층의 이벤트 데이터들의 반복 생성을 방지하고 유연하게 시간적 관점을 적용하여 데이터를 분석할 수 있도록, 관리자 또는 사용자에 의해 임의로 설정된 시간 범위일 수 있다. The event data generating unit 130 may determine whether the record of the acquisition date / time field of the n-th value is included in the preset time range without the record difference satisfying the second condition (S640). The time range may be a time range arbitrarily set by an administrator or a user so as to prevent repetitive generation of event data of a higher layer within a predetermined time interval and to analyze data by applying a flexible temporal viewpoint.

이벤트 데이터 생성부(130)는, n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 포함되지 않으면, 상위 계층 이벤트 네임 필드와 상위 계층 이벤트 일시 필드를 생성하여 상위 계층의 이벤트 데이터를 생성할 수 있고(S650, 도 7 및 도 8의 HD1 내지 HDy 참조), n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 포함되면 상위 계층의 이벤트 데이터를 생성하지 않고 n+1번째 값에 대해 단계 S610을 수행할 수 있다(S670).If the record of the acquisition date and time field of the n-th value is not included in the preset time range, the event data generation unit 130 generates the upper layer event name field and the upper layer event date field to generate event data of the upper layer If the record of the acquisition date / time field of the n-th value is included in the preset time range, event data of the upper layer is not generated and the (n + 1) Step S610 may be performed (S670).

예를 들면, 하위 계층 이벤트 네임이 차량 운행자의 '과속'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 속하지 않으면, 상기 레코드 차이에 상응하는 기간 동안을 과속으로 판단하되 소정 시간 범위 내에서의 연속적인 과속은 아닌 것으로 판단할 수 있다. 이에 따라, 이벤트 데이터 생성부(130)는, 상위 계층 이벤트 네임 필드의 레코드를 '연속 과속'으로 생성할 수 있고, 획득 일시 필드 레코드 각각과 레코드 차이를 기초로 상위 계층 이벤트 일시 필드의 레코드를 생성하여 상위 계층의 이벤트 데이터를 발생시킬 수 있다.For example, when the lower layer event name is related to the 'overspeed' of the vehicle operator, if the record of the acquisition date / time field of the n-th value does not belong to the preset time range, It can be determined that the overspeed during the period corresponding to the predetermined time is not the continuous overspeed within the predetermined time range. Accordingly, the event data generation unit 130 can generate a record of the upper layer event name field as 'continuous overspeed', generate a record of the upper layer event date / time field based on each of the acquisition date / time field records and the record difference And event data of an upper layer can be generated.

다른 예를 들면, 하위 계층 이벤트 네임이 발전 설비의 '비정상'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 속하지 않으면, 상기 레코드 차이에 상응하는 기간 동안 오작동 상태인 것으로 판단하되 소정 시간 범위 내에서의 연속적인 오작동은 아닌 것으로 판단할 수 있다. 이에 따라, 이벤트 데이터 생성부(130)는 상위 계층 이벤트 네임 필드의 레코드를 '연속 오작동'으로 생성할 수 있다.For example, if the lower layer event name is related to the 'abnormal' of the power generation facility, the event data generation unit 130, if the record of the acquisition date and time field of the n-th value does not belong to the predetermined time range, It can be determined that the malfunction is in a state of malfunction for a period corresponding to the difference, but it can be judged that the malfunction is not a continuous malfunction within a predetermined time range. Accordingly, the event data generating unit 130 can generate a record of the upper layer event name field as a 'continuous malfunction'.

또 다른 예를 들면, 하위 계층 이벤트 네임이 '위험 강우량'에 연관된 경우, 이벤트 데이터 생성부(130)는, 상기 n번째 값의 획득 일시 필드의 레코드가 미리 설정된 시간 범위에 속하지 않으면, 상기 레코드 차이에 상응하는 기간 동안의 집중적인 호우 발생으로 판단하되 소정 시간 범위 내에서 연속된 것은 아닌 것으로 판단하여 상위 계층 이벤트 네임 필드의 레코드를 '연속 위험 강우량'으로 생성할 수 있다.In another example, when the lower layer event name is related to the 'dangerous rainfall amount', the event data generation unit 130, if the record of the acquisition date and time field of the n-th value does not belong to the preset time range, It is determined that the occurrence of the heavy rainfall during the corresponding period is not continuous within the predetermined time range, and a record of the upper layer event name field is generated as the 'continuous risk rainfall amount'.

이벤트 데이터 생성부(130)는 어느 하나의 리스트에서 n번째 값이 마지막값인지 여부를 판단할 수 있다(S660). 이벤트 데이터 생성부(130)는, n번째 값이 마지막 값이면 프로세스를 종료하고, n번째 값이 마지막 값이 아니면 n+1번째 값에 대해 단계 S610을 수행할 수 있다(S670).
The event data generation unit 130 may determine whether the n-th value is the last value in any of the lists (S660). If the nth value is not the last value, the event data generating unit 130 may perform step S610 on the (n + 1) th value in step S670.

차상위 계층의 이벤트 데이터 생성 프로세스(Event data generation process of the next higher layer ( S240S240 ))

도 2를 참조하면, 이벤트 데이터 생성부(130)는 상위 계층의 이벤트 데이터들을 이용하여 차상위 계층의 이벤트 데이터들을 생성할 수 있다(S240). 즉, 이벤트 데이터 생성부(130)는 상위 계층의 이벤트 데이터들보다 장기적인 관점의 의미를 내재하는 차상위 계층의 이벤트 데이터들을 추출할 수 있다.Referring to FIG. 2, the event data generation unit 130 may generate event data of a next higher layer using event data of an upper layer (S240). That is, the event data generation unit 130 can extract event data of the next higher layer having the meaning of a longer-term viewpoint than the event data of the upper layer.

이벤트 데이터 생성부(130)는, 상술한 단계 S220 및 단계 S230에서와 유사하게, 상위 계층의 이벤트 데이터들을 시간 순서대로 정렬한 후 상위 계층 이벤트 데이터들의 연속성 여부(또는 연속성 여부 및 소정의 시간 범위 내인지 여부)를 기초로 차상위 계층의 이벤트 데이터들을 생성할 수 있다. 더 상세하게는, 이벤트 데이터 생성부(130)는, 상위 계층의 이벤트 데이터에 포함된 식별자 필드, 상위 계층 이벤트 네임 필드, 상위 계층 이벤트 일시 필드를 기초로 키와 값을 정의할 수 있고, 상위 계층의 이벤트 데이터들을 서로 공통되는 키를 기준으로 소팅하여 상기 키들 각각에 대해 시간순으로 정렬된 값들을 갖는 리스트들을 생성하고, 리스트들을 이용하여 차상위 계층의 이벤트 데이터를 생성할 수 있다.Similar to the above-described steps S220 and S230, the event data generating unit 130 arranges the event data of the upper layer in chronological order, and then determines whether the upper layer event data is continuous (or whether it is continuous or within a predetermined time range The event data of the next higher layer can be generated. More specifically, the event data generating unit 130 may define a key and a value based on an identifier field, an upper layer event name field, and an upper layer event date / time field included in event data of an upper layer, The event data of the next higher layer may be generated using the lists by generating lists having values sorted in chronological order with respect to each of the keys by sorting the event data of the lower layer by a common key.

이벤트 데이터 생성부(130)는, 상위 계층의 이벤트 데이터들을 이용한 차상위 계층의 이벤트 데이터 생성 프로세스와 유사하게, 차상위 계층의 이벤트 데이터들을 이용하여 더 높은 수준의 이벤트 데이터들을 생성할 수도 있음은 물론이다.The event data generating unit 130 may generate higher level event data using event data of the next higher layer, similar to the event data generating process of the next higher layer using event data of the upper layer.

이를 통해, 차량 운행 데이터로 구성되는 빅데이터와 관련하여서는, 이벤트 데이터 생성부(130)는 단기적인 관점에서의 '연속 과속'과 관련된 이벤트 데이터로부터 중장기적인 관점에서의 '장기 과속', 차량 운전자의 운전 성향, 패턴 등과 관련된 더 높은 수준의 이벤트 데이터들을 생성할 수 있다. Accordingly, in connection with the big data composed of the vehicle driving data, the event data generating unit 130 extracts, from the event data related to the 'continuous speeding' in the short-term viewpoint, 'long-term speeding' Trends, patterns, and so on.

또한, 센싱 데이터로 구성되는 빅데이터의 경우, 이벤트 데이터 생성부(130)는 단기적인 관점에서의 '연속 오작동'과 관련된 이벤트 데이터 생성으로부터 중장기적인 관점에서의 '장기 오작동', 발전 설비의 고장여부 등과 관련된 더 높은 수준의 이벤트 데이터들을 생성할 수 있다. In the case of the big data constituted by the sensing data, the event data generation unit 130 may generate the event data related to the 'continuous malfunction' from the short-term point of view and generate a 'long-term malfunction' It is possible to generate related higher level event data.

또한, 기상 데이터로 구성되는 빅데이터의 경우, 이벤트 데이터 생성부(130)는 단기적인 관점에서의 '연속 위험 강수량'과 관련된 이벤트 데이터 생성으로부터 중장기적인 관점에서의 '호우 주의보 발령', 시기에 따른 강우량 반복 패턴 등과 관련된 더 높은 수준의 이벤트 데이터들을 생성할 수 있다.
In the case of the big data composed of the weather data, the event data generating unit 130 generates the event data related to the 'continuous risk precipitation amount' from the short-term viewpoint, the 'announcement of the storm warning' It is possible to generate higher-level event data related to a repetitive pattern or the like.

이와 같이, 빅데이터 처리 시스템(100)은, 이벤트 데이터 생성부(130)가 계층적으로 이벤트 데이터들을 추출할 수 있고, 분석부(150)가 계층적 이벤트 데이터들을 기초로 빅데이터에서 반복되는 패턴, 현상, 사건 등을 분석한 결과정보를 관리자 또는 사용자에게 제공할 수 있어, 관리자 또는 사용자의 정확한 의사 결정, 효율적인 대책 마련 등을 가능하게 할 수 있다.In this way, the big data processing system 100 can extract the event data hierarchically from the event data generation unit 130, and the analysis unit 150 can generate a pattern in which the analysis data is repeated in the big data based on the hierarchical event data. The present invention can provide the information to the administrator or the user as a result of analyzing the event, the event, and the like, thereby making it possible to make an accurate decision of the administrator or the user, and to prepare an effective countermeasure.

또한, 이벤트 데이터 생성부(130)에서 기초적인 이벤트와 관련 데이터로부터 상위 수준의 이벤트와 관련된 데이터들을 추출해냄에 있어서 병렬적으로 데이터들을 처리할 수 있어, 보다 신속하게 유의미한 정보들을 추출할 수 있고 관리자 또는 사용자의 편의를 도모할 수 있다.
In addition, the event data generation unit 130 extracts data related to a high-level event from a basic event and related data, and can process data in parallel, thereby extracting meaningful information more quickly, Or the convenience of the user.

이상, 본 발명을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 발명은 상기 실시예에 한정되지 않고, 본 발명의 기술적 사상 및 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러가지 변형 및 변경이 가능하다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, This is possible.

100: 빅데이터 처리 시스템
110: 빅데이터 저장부
130: 이벤트 데이터 생성부
150: 분석부100: Big data processing system
110: Big data storage unit
130: Event data generation unit
150: Analytical Department

Claims

delete

Generating low-level event data composed of a key and a value associated with the acquisition date and time of the raw data using raw data of the big data;
Sorting the event data of the lower layer based on the key and generating lists having chronologically sorted values for each of the keys; And
Generating event data of an upper layer using the lists;
, &Lt; / RTI &
Wherein the raw data includes an acquisition date and time field, an identifier field, and a numeric field, the key including: an acquisition date / time field, an identifier field and a lower layer event name field generated under a predetermined condition, Wherein the value comprises the acquisition date and time field or the acquisition date and time field and a lower layer event name field,
The step of generating event data of an upper layer using the lists may include:
Parsing the n-th value of the sorted values by field;
Determining whether a record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value meets a second preset condition; And
If the record difference does not match the second condition, generates an upper layer event name field and generates an upper layer event date field based on the record of the acquisition date / time field of the nth value to generate event data of the upper layer The method comprising the steps < RTI ID = 0.0 > of: < / RTI >

3. The method of claim 2,
Wherein the generating of the lower layer event data comprises:
Parsing each of the row data by fields;
Determining whether a numeric field record of the corresponding row data satisfies a first condition set in advance if a record exists in the numeric field of each of the row data; And
And generating the lower layer event name field to generate event data of the lower layer composed of the key and value if the numeric field record of the corresponding row data satisfies the first condition. Data processing method.

3. The method of claim 2,
Wherein the generating the lists comprises:
Sorting the event data of the lower layer based on at least one partial key associated with a field other than the acquisition date and time field, and sorting the values for each of the partial keys; And
And rearranging the sorted values based on a partial key associated with the acquisition date and time field to generate the lists having chronologically sorted values for each of the keys.

delete

Generating low-level event data composed of a key and a value associated with the acquisition date and time of the raw data using raw data of the big data;
Sorting the event data of the lower layer based on the key and generating lists having chronologically sorted values for each of the keys; And
Generating event data of an upper layer using the lists;
, &Lt; / RTI &
Wherein the raw data includes an acquisition date and time field, an identifier field, and a numeric field, the key including: an acquisition date / time field, an identifier field and a lower layer event name field generated under a predetermined condition, Wherein the value comprises the acquisition date and time field or the acquisition date and time field and a lower layer event name field,
The step of generating event data of an upper layer using the lists may include:
Parsing the n-th value of the sorted values by field;
Determining whether a record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) th value meets a second preset condition; And
Determining whether the record of the acquisition date / time field of the n-th value falls within a predetermined time range if the record difference does not match the second condition; And
If the record of the acquisition date / time field of the n-th value does not fall within the time range, generates an upper layer event name field and generates an upper layer event date / time field based on the record of the acquisition date / And generating event data of the second data.

Generating low-level event data composed of a key and a value associated with the acquisition date and time of the raw data using raw data of the big data;
Sorting the event data of the lower layer based on the key and generating lists having chronologically sorted values for each of the keys; And
Generating event data of an upper layer using the lists;
, &Lt; / RTI &
Wherein the raw data includes an acquisition date and time field, an identifier field, and a numeric field, the key including: an acquisition date / time field, an identifier field and a lower layer event name field generated under a predetermined condition, Wherein the value comprises the acquisition date and time field or the acquisition date and time field and a lower layer event name field,
The event data of the upper layer includes:
A key and a value associated with the upper layer event date and time,
After generating the event data of the upper layer,
Sorting the event data of the upper layer based on the key to generate lists having the values sorted in chronological order for each of the keys; And
And generating event data of a next higher layer using the lists.

delete

It is a system for processing and analyzing time series big data,
A big data storage unit for storing the time series big data; And
Generating event data of a lower layer related to the meaning indicated by the row data using the row data of the time series big data, and generating event data of a lower layer related to the meaning represented by the event data of the lower layer using the event data of the lower layer An event data generation unit for generating event data;
, &Lt; / RTI &
Wherein the event data generating unit comprises:
Generating event data of the lower layer composed of keys and values associated with the acquisition date and time of the row data using the row data,
Wherein the raw data includes an acquisition date and time field, an identifier field, and a numeric field, the key including: an acquisition date / time field, an identifier field and a lower layer event name field generated under a predetermined condition, Wherein the value comprises the acquisition date and time field or the acquisition date and time field and a lower layer event name field,
Wherein the event data generating unit comprises:
Sorting event data of the lower layer on the basis of the key, generating lists having values sorted in chronological order with respect to each of the keys, generating event data of the upper layer using the lists,
Wherein the event data generating unit comprises:
Determining whether a record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) -th value satisfies a preset second condition, If the record difference does not match the second condition, generates an upper layer event name field and generates an upper layer event date field based on the record of the acquisition date / time field of the nth value to generate event data of the upper layer Big data processing system.

10. The method of claim 9,
Wherein the event data generating unit comprises:
Parsing each of the row data by field,
Determining whether a numeric field record of the corresponding row data satisfies a first condition set in advance if a record is present in the numeric field of each of the row data,
And if the numeric field record of the corresponding row data meets the first condition, generates the lower layer event name field to generate event data of the lower layer composed of the key and the value.

delete

10. The method of claim 9,
Wherein the event data generating unit comprises:
Sort the event data of the lower layer based on at least one partial key associated with a field other than the acquisition date and time field, sort the values for each of the partial keys,
And rearranging the sorted values based on a partial key associated with the acquisition date and time field to generate the lists having chronologically sorted values for each of the keys.

delete

It is a system for processing and analyzing time series big data,
A big data storage unit for storing the time series big data; And
Generating event data of a lower layer related to the meaning indicated by the row data using the row data of the time series big data, and generating event data of a lower layer related to the meaning represented by the event data of the lower layer using the event data of the lower layer An event data generation unit for generating event data;
, &Lt; / RTI &
Wherein the event data generating unit comprises:
Generating event data of the lower layer composed of keys and values associated with the acquisition date and time of the row data using the row data,
Wherein the raw data includes an acquisition date and time field, an identifier field, and a numeric field, the key including: an acquisition date / time field, an identifier field and a lower layer event name field generated under a predetermined condition, Wherein the value comprises the acquisition date and time field or the acquisition date and time field and a lower layer event name field,
Wherein the event data generating unit comprises:
Sorting event data of the lower layer on the basis of the key, generating lists having values sorted in chronological order with respect to each of the keys, generating event data of the upper layer using the lists,
Wherein the event data generating unit comprises:
Determining whether a record difference between the acquisition date and time field of the n-th value and the acquisition date and time field of the (n-1) -th value satisfies a preset second condition, Determining whether the record of the acquisition date / time field of the n-th value belongs to a predetermined time range if the record difference does not match the second condition, and if the record of the acquisition date / time field of the n- Generates an upper layer event name field and generates an upper layer event date field based on the record of the acquisition date and time field of the nth value to generate event data of the upper layer.

It is a system for processing and analyzing time series big data,
A big data storage unit for storing the time series big data; And
Generating event data of a lower layer related to the meaning indicated by the row data using the row data of the time series big data, and generating event data of a lower layer related to the meaning represented by the event data of the lower layer using the event data of the lower layer An event data generation unit for generating event data;
, &Lt; / RTI &
The event data of the upper layer includes:
A key and a value associated with the upper layer event date and time,
Wherein the event data generating unit comprises:
The event data of the upper layer is sorted by the key, the lists having the values sorted in chronological order are generated for each of the keys, and the event data of the next higher layer is further generated using the lists system.

10. The method of claim 9,
The big data storage unit stores,
A big data processing system, consisting of a distributed file system.

10. The method of claim 9,
Wherein the big data storage further stores event data of the lower and upper layers generated by the event data generator,
And an analysis unit for generating analysis result information on the meaning indicated by the big data using at least one of the low data and the lower and upper layer event data.