KR20170062358A

KR20170062358A - Apparatus and method for processing structured stream data

Info

Publication number: KR20170062358A
Application number: KR1020160058663A
Authority: KR
Inventors: 박경현; 원희선
Original assignee: 한국전자통신연구원
Priority date: 2015-11-27
Filing date: 2016-05-13
Publication date: 2017-06-07
Also published as: KR102072236B1

Abstract

본 발명은 정형 스트림 데이터 처리 기술에 관한 것으로, 본 발명의 일 측면에 따른 정형 스트림 데이터 처리장치는, XML 형태로 이루어진 스트림 데이터를 대상으로 스키마를 추출하고, 추출된 스키마를 기반으로 하여 데이터베이스 테이블을 생성하는 테이블 생성부; 및 스트림 데이터를 수집하고, 수집된 스트림 데이터에 대한 전처리를 수행하며, 전처리된 스트림 데이터를 상기 데이터베이스 테이블에 저장하는 데이터 처리부를 포함한다.An apparatus for processing a fixed stream data according to an aspect of the present invention is a system for extracting a schema from a stream of XML data and extracting a database table based on the extracted schema A table generating unit for generating a table; And a data processing unit for collecting the stream data, performing preprocessing on the collected stream data, and storing the preprocessed stream data in the database table.

Description

[0001] Apparatus and method for processing structured stream data [

본 발명은 정형 스트림 데이터 처리 기술에 관한 것으로, 상세하게는 스키마 정보가 없는 정형 스트림 데이터로부터 자동으로 데이터 스키마를 추출하고, 이를 바탕으로 관계형 테이블을 생성함으로써 사용자의 관여 없이 자동으로 스트림 데이터를 저장할 수 있는 정형 스트림 데이터 처리장치 및 처리방법에 관한 것이다.
More particularly, the present invention relates to a technique of automatically extracting a data schema from a formatted stream data without schema information and automatically generating a relational table based on the extracted data schema, The present invention relates to a device for processing a fixed stream data and a processing method thereof.

빅데이터(Big-data) 분석은 데이터의 입력 타입에 따라 크게 배치 분석과 스트림 분석으로 구분할 수 있다. 스트림 데이터는 데이터 형태에 따라 정형 데이터와 비정형 데이터로 구분할 수 있는데, 정형 데이터를 실시간으로 수집하여 저장할 경우, 일반적으로 정형 데이터의 구조와 매핑되는 데이터베이스 테이블에 데이터를 저장한다.Big-data analysis can be divided into batch analysis and stream analysis according to the input type of data. The stream data can be divided into fixed data and irregular data according to the data type. When the fixed data is collected and stored in real time, generally, the data is stored in the database table mapped with the structure of the fixed data.

따라서, 만약 수집하려는 스트림 데이터를 저장할 데이터베이스 테이블이 존재하지 않거나, 스트림 데이터가 스키마 정보를 가지고 있지 않아서 저장 테이블을 생성할 숭 없다면, 스트림 데이터를 데이터베이스에 효율적으로 저장할 수 없는 문제가 발생한다.Therefore, if the database table for storing the stream data to be collected does not exist, or if the stream data does not have schema information and thus can not generate the storage table, there arises a problem that the stream data can not be efficiently stored in the database.

또한, 기존의 스트림 데이터 수집 시스템들은 데이터 수집 로직이 코드 레벨로 구현되었기 때문에, 사용자에 의해 변경 및 수정이 필요한 경우 유연하게 대처하지 못한다는 단점이 있다.
In addition, existing stream data collection systems have the disadvantage that they can not flexibly cope with changes and modifications required by the user because the data collection logic is implemented at the code level.

따라서, 본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 스키마 정보가 없는 정형 스트림 데이터로부터 자동으로 데이터 스키마를 추출하고, 이를 바탕으로 관계형 테이블을 생성함으로써 사용자의 관여 없이 자동으로 스트림 데이터를 저장할 수 있는 정형 스트림 데이터 처리장치 및 처리방법을 제공함에 있다.SUMMARY OF THE INVENTION Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and it is an object of the present invention to provide a method and apparatus for automatically extracting a data schema from formatted stream data without schema information, And is capable of automatically storing stream data without involvement of the user, and a processing method therefor.

본 발명은 또 다른 목적은, 기존의 워크플로우 관리 도구를 확장하여 스트림 데이터 처를 위한 워크플로우 환경을 제공하여 사용자들이 쉽게 스트림 데이터를 수집하고 처리할 수 있도록 하는 정형 스트림 데이터 처리장치 및 처리방법을 제공함에 있다.
It is still another object of the present invention to provide a method and apparatus for processing a fixed stream data by expanding an existing workflow management tool to provide a workflow environment for processing stream data to allow users to easily collect and process stream data .

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 정형 스트림 데이터 처리장치는, XML 형태로 이루어진 스트림 데이터를 대상으로 스키마를 추출하고, 추출된 스키마를 기반으로 하여 데이터베이스 테이블을 생성하는 테이블 생성부; 및 스트림 데이터를 수집하고, 수집된 스트림 데이터에 대한 전처리를 수행하며, 전처리된 스트림 데이터를 상기 데이터베이스 테이블에 저장하는 데이터 처리부를 포함한다.
According to an aspect of the present invention, there is provided an apparatus for processing a fixed stream data according to one aspect of the present invention, comprising: a table extracting unit for extracting a schema from stream data in an XML format and generating a database table based on the extracted schema; part; And a data processing unit for collecting the stream data, performing preprocessing on the collected stream data, and storing the preprocessed stream data in the database table.

이와 같은 본 발명에 따른 스트림 데이터 처리 기술을 이용하면, 스키마 정보가 없는 정형 스트림 데이터로부터 자동으로 데이터 스키마를 추출하고, 이를 바탕으로 관계형 테이블을 생성함으로써 사용자의 관여 없이 자동으로 스트림 데이터를 저장할 수 있다.By using the stream data processing technique according to the present invention, the data schema can be automatically extracted from the fixed stream data without the schema information, and the relational table can be generated based on the extracted data schema to automatically store the stream data without involvement of the user .

또한, 본 발명의 스트림 데이터 처리 기술은 스트림 데이터 처리를 위한 워크플로우 환경을 제공하기 때문에, 사용자들이 쉽게 스트림 데이터를 수집하고 처리할 수 있다.
Further, since the stream data processing technique of the present invention provides a workflow environment for stream data processing, users can easily collect and process stream data.

도 1은 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치의 구성을 도시한 구성도이다.
도 2는 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치의 데이터베이스 테이블 생성 절차를 도시한 순서도이다.
도 3은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 이용되는 스트림 데이터의 일례를 도시한 도면이다.
도 4는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 구축된 데이터 그래프의 일례를 도시한 도면이다.
도 5는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 추출되는 최대 경계 스키마의 일례를 도시한 도면이다.
도 6은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 추출되는 최소 경계 스키마의 일례를 도시한 도면이다.
도 7은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 생성되는 스키마 트리의 일례를 도시한 도면이다.
도 8은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 있어서의 분해된 스키마 트리의 일례를 도시한 도면이다.
도 9는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 생성된 스키마 트리-관계형 테이블 매핑 정보의 일례를 도시한 도면이다.
도 10은 스트림 데이터 처리를 위해 확장된 워크플로우 관리 시스템의 구조를 도시한 도면이다.
도 11은 도 10의 워크플로우 관리 시스템에서의 실행을 위한 스트림 노드의 구조를 도시한 도면이다.
도 12는 도 10의 워크플로우 관리 시스템에서의 실행을 위한 스트림 노드의 타입을 도시한 도면이다.
도 13은 본 발명의 실시 예에 따른 워크플로우 관리 시스템의 스트림 노드의 WDL 워크플로우 시맨틱의 일례를 도시한 도면이다.
도 14는 본 발명의 실시 예에 따른 워크플로우 관리 시스템의 스트림 처리 과정의 일례를 도시한 도면이다.1 is a configuration diagram showing a configuration of a fixed stream data processing apparatus according to an embodiment of the present invention.
2 is a flowchart showing a database table creation procedure of the apparatus for processing a fixed stream data according to an embodiment of the present invention.
3 is a diagram illustrating an example of stream data used in a database table creation procedure according to an embodiment of the present invention.
4 is a diagram illustrating an example of a data graph constructed according to a database table creation procedure according to an embodiment of the present invention.
5 is a diagram illustrating an example of a maximum boundary schema extracted according to a database table creation procedure according to an embodiment of the present invention.
6 is a diagram illustrating an example of a minimum boundary schema extracted according to a database table creation procedure according to an embodiment of the present invention.
7 is a diagram illustrating an example of a schema tree generated according to a database table creation procedure according to an embodiment of the present invention.
8 is a diagram illustrating an example of an exploded schema tree in the database table creation procedure according to the embodiment of the present invention.
9 is a view showing an example of schema tree-relational table mapping information generated according to the database table creation procedure according to the embodiment of the present invention.
10 is a diagram showing a structure of an extended workflow management system for stream data processing.
11 is a diagram showing a structure of a stream node for execution in the workflow management system of FIG.
Fig. 12 is a diagram showing types of stream nodes for execution in the workflow management system of Fig. 10; Fig.
13 is a diagram showing an example of a WDL workflow semantic of a stream node in a workflow management system according to an embodiment of the present invention.
FIG. 14 is a diagram showing an example of a stream processing process of the workflow management system according to the embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 도면부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like numbers refer to like elements throughout.

본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.
In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

이하, 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치 및 처리방법에 대하여 첨부된 도면을 참조하여 상세하게 설명한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An apparatus and method for processing a fixed stream data according to an embodiment of the present invention will now be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치의 구성을 도시한 구성도이다.1 is a configuration diagram showing a configuration of a fixed stream data processing apparatus according to an embodiment of the present invention.

기존의 스트림 데이터 처리장치들은 이미 데이터 저장을 위한 테이블과 전처리 로직이 존재한다는 가정 하에 실행되는 반면, 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치(이하 ‘데이터 처리장치’)는 스트림 데이트로부터 데이터 스키마를 추출하고, 워크플로우 컴포넌트를 이용하여 사용자가 원하는 로직을 자유롭게 적용할 수 있도록 구현된다.Existing stream data processing apparatuses are already executed on the assumption that a table for storing data and preprocessing logic exist, while a device for processing a fixed stream data (hereinafter referred to as 'data processing apparatus') according to an embodiment of the present invention receives stream data Extracts the schema, and freely applies the logic desired by the user using the workflow component.

상기 데이터 처리장치(10)는 수집되는 스트림 데이터를 처리하기 위해, 테이블 생성부(11) 및 데이터 처리부(13)로 구성될 수 있다.The data processing apparatus 10 may include a table generating unit 11 and a data processing unit 13 for processing stream data to be collected.

상기 테이블 생성부(11)는 데이터베이스 테이블을 생성하는 구성으로서, XML 형태로 이루어진 스트림 데이터를 대상으로 스키마를 추출하고, 추출된 스키마를 기반으로 하여 데이터베이스 테이블을 생성한다.The table generation unit 11 is a configuration for generating a database table, extracts a schema for stream data in XML format, and generates a database table based on the extracted schema.

상기 데이터 처리부(13)는 스트림 데이터를 수집 및 처리하는 구성으로서, 스트림 데이터를 수집하고, 수집된 스트림 데이터에 대한 전처리를 수행하며, 전처리된 스트림 데이터를 데이터베이스 테이블에 저장한다.The data processing unit 13 collects stream data, collects stream data, preprocesses the collected stream data, and stores the preprocessed stream data in a database table.

이때, 상기 데이터 처리부(13)의 스트림 데이터에 대한 처리와 관련된 사항에 대해서는 도 10 내지 14를 참조하여 후술하도록 한다.
At this time, matters related to the processing of the stream data of the data processing unit 13 will be described later with reference to FIGS. 10 to 14. FIG.

도 2는 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치의 데이터베이스 테이블 생성 절차를 도시한 순서도이다.2 is a flowchart showing a database table creation procedure of the apparatus for processing a fixed stream data according to an embodiment of the present invention.

도 2의 절차는 도 1의 테이블 생성부(11)에 의해 수행될 수 있으며, 먼저 저장하고자 하는 데이터를 입력받기 위한 소스 노드(source node)를 선택하고(S20), 선택된 소스 노드를 통해 일정 시간 동안 스트림 데이터를 입력받는다(S21).The procedure of FIG. 2 can be performed by the table generating unit 11 of FIG. 1. First, a source node for receiving data to be stored is selected (S20) (S21).

그리고, 테이블 생성부(11)는 입력받은 데이터를 메모리 상에 그래프 모델을 기반으로 하는 데이터 그래프로 저장하여 데이터 그래프를 구축한다(S22).The table generating unit 11 stores the input data as a data graph based on the graph model on the memory, and constructs a data graph (S22).

단계 S22에 따라 데이터 그래프를 구축한 후, 테이블 생성부(11)는 데이터 그래프로부터 최대 경계 스키마와 최소 경계 스키마를 추출한다(S23). 이때, 최대 경계 스키마와 최소 경계 스키마는 모두 XML 데이터의 스키마를 표현하기 위한 그래프 기반의 데이터 구조이다.After constructing the data graph according to step S22, the table generating unit 11 extracts the maximum bounding schema and the minimum bounding schema from the data graph (S23). At this time, both the maximum bounding schema and the minimum bounding schema are graph-based data structures for expressing the schema of the XML data.

단계 S23에 따라 최대/최소 경계 스키마를 추출한 후, 테이블 생성부(11)는 추출된 최대/최소 경계 스키마를 바탕으로 스키마 트리를 생성한다(S24). 이때, 생성된 스키마 트리는 스트림 데이터의 스키마를 나타내지만 스키마 정보가 트리 구조로 이루어졌기 때문에 직접 관계형 테이블로의 매핑은 불가능하다.After extracting the maximum / minimum boundary schema according to step S23, the table generation unit 11 generates a schema tree based on the extracted maximum / minimum boundary schema (S24). At this time, the generated schema tree represents the schema of the stream data, but mapping to the relational table is impossible because the schema information is made up of a tree structure.

단계 S24에 따라 스키마 트리를 생성한 후, 테이블 생성부(11)는 스키마 트리를 분해하여 서브 트리를 생성함으로써, 스키마 트리를 관계형 테이블로 매핑하기 위한 정보(트리-테이블 매핑 정보)를 생성한다(S25).After generating the schema tree according to step S24, the table generation unit 11 generates information (tree-table mapping information) for mapping the schema tree to the relational table by decomposing the schema tree to generate the subtree S25).

단계 S25에 따라 트리-테이블 매핑 정보를 생성한 후, 테이블 생성부(11)는 생성된 트리-테이블 매핑 정보에 따라 스트림 데이터 저장을 위한 테이블을 생성한다(S26).
After generating the tree-table mapping information according to step S25, the table generating unit 11 generates a table for storing stream data according to the generated tree-table mapping information (S26).

이상에서는 도 2를 참조하여 본 발명의 실시 예에 따른 정형 스트림 데이터 처리장치의 데이터베이스 테이블 생성 절차에 대해서 살펴보았다. 이하에서는 각 절차의 예를 들어 보다 구체적으로 살펴보기로 한다.
The database table creation procedure of the fixed stream data processing apparatus according to the embodiment of the present invention has been described above with reference to FIG. Hereinafter, an example of each procedure will be described in more detail.

도 3은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 이용되는 스트림 데이터의 일례를 도시한 도면으로서, 도 3에 도시된 스트림 데이터는 인터넷 서점에서 판매되고 있는 도서 정보와 도서 관련 후기 정보를 나타낸다.3 illustrates an example of stream data used in a database table creation procedure according to an embodiment of the present invention. The stream data shown in FIG. 3 indicates book information and book related late information that are sold at an Internet bookstore .

도 3의 스트림 데이터는 <store>를 루트 노드로 하여 도서를 나타내는 <book> 단위로 표현되고, 도서는 <info>에 포함되는 도서 정보, <author>에 포함되는 작가 정보, <comment>에 포함되는 후기 정보로 구성된다.The stream data in FIG. 3 is represented by a <book> unit representing a book with <store> as a root node. The book is included in book information included in <info>, author information included in <author> Information on the future.

데이터 소스 노드는 다중의 사용자들이 도서의 후기를 올릴 때마다 주기적으로 도 3과 같은 형태로 데이터를 입력받는다고 가정한다.It is assumed that the data source node periodically receives data in the form shown in FIG. 3 whenever multiple users upload a bookmark.

XML 데이터는 기본적으로 구조적 문서를 정의하는 모델로부터 시작되었기 때문에 메모리 상에 DOM(Document Object Model) 형태로 저장된다. 따라서, XML 데이터로부터 스키마를 추출하기 위해서는 먼저 레이블과 방향성이 있는 그래프(Labeled Directed Graph)로 변환하여 저장해야 한다.
XML data is basically stored in the form of a Document Object Model (DOM) in memory since it is started from a model that defines a structured document. Therefore, in order to extract a schema from XML data, it is first converted into a labeled directed graph and stored.

도 4는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 구축된 데이터 그래프의 일례를 도시한 도면이다.4 is a diagram illustrating an example of a data graph constructed according to a database table creation procedure according to an embodiment of the present invention.

도 4의 데이터 그래프는 도 3에 도시된 바와 같은 스트림 데이터를 기반으로 구축된 것으로서, 도 4에 도시된 바와 같이, 데이터 그래프에는 레이블명이 해당 노드로 들어오는 간선 상에 표시되고, 데이터 그래프는 입력되는 모든 데이터를 포함하고 있으며, 최대/최소 경계 스키마 추출의 입력 데이터로 사용된다.
The data graph of FIG. 4 is constructed based on the stream data as shown in FIG. 3. As shown in FIG. 4, the data graph is displayed on the trunk where the label name is input to the corresponding node, It contains all the data and is used as the input data for extracting the maximum / minimum bounding schema.

도 5는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 추출되는 최대 경계 스키마의 일례를 도시한 도면이다.5 is a diagram illustrating an example of a maximum boundary schema extracted according to a database table creation procedure according to an embodiment of the present invention.

도 5의 최대 경계 스키마는 도 4의 데이터 그래프로부터 추출된 것으로서, 도 5에 도시된 바와 같이, 최대 경계 스키마는 데이터 가이드를 이용하여 추출 가능한데, 데이터 가이드는 데이터베이스 구조를 간결하고 정확하게 나타내기 위한 구조로, 데이터 소스의 모든 유일한 레이블 경로를 데이터 소스에 나타나는 빈도에 상관없이 한 번만 기술한다. 따라서, 데이터 가이드의 이러한 특성을 이용하여 데이터 그래프로부터 최대 경계 스키마를 추출할 수 있다.
The maximum boundary schema of FIG. 5 is extracted from the data graph of FIG. 4. As shown in FIG. 5, the maximum boundary schema can be extracted using a data guide. The data guide includes a structure for concisely and accurately representing the database structure , Describe all unique label paths in the data source only once, regardless of how often they appear in the data source. Thus, using this property of the data guide, the maximum bounding schema can be extracted from the data graph.

도 6은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 추출되는 최소 경계 스키마의 일례를 도시한 도면이다.6 is a diagram illustrating an example of a minimum boundary schema extracted according to a database table creation procedure according to an embodiment of the present invention.

도 6의 최소 경계 스키마는 도 4의 데이터 그래프로부터 추출된 것으로서, 도 6에 도시된 바와 같이, 최소 경계 스키마는 데이터 로그를 이용하여 추출할 수 있는데, 특히 데이터 로그의 최대 고정점을 적용하여 타입을 분류함으로써 추출할 수 있다.
The minimum boundary schema of FIG. 6 is extracted from the data graph of FIG. 4. As shown in FIG. 6, the minimum boundary schema can be extracted using a data log. In particular, Can be extracted.

최대 경계 스키마의 경우, 주어진 데이터 그래프에 대해 타입을 구분할 때 모호성이 발생하지 않지만, 최소 경계 스키마의 경우에는 모호성이 발생한다. 예를 들어, 도 6의 최소 경계 스키마의 경우, book 레이블을 통해 도달할 수 있는 노드는 2개로 레이블만 가지고는 타입을 결정하는데 모호함이 있다.In the case of the maximum bounding schema, there is no ambiguity in the type distinction for a given data graph, but in the case of the minimal bounding schema ambiguity occurs. For example, in the case of the minimum bounding schema shown in FIG. 6, there are ambiguities in determining the type with only two labels, which can be reached through the book label.

이처럼 최대 경계 스키마나 최소 경계 스키마만으로는 스트림 데이터의 스키마를 표현하기에는 적합하지 못하다. Thus, the maximum bounding schema or the minimum bounding schema is not suitable for representing the schema of the stream data.

따라서, 보다 정확한 데이터 스키마를 추출하는 것이 필요한데, 이를 위해 본 발명에서는 최대 경계 스키마와 최소 경계 스키마를 이용함으로써 스키마 트리를 생성하는 것이 제안된다.Accordingly, it is necessary to extract a more accurate data schema. To this end, it is proposed in the present invention to generate a schema tree by using a maximum boundary schema and a minimum boundary schema.

도 7은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 생성되는 스키마 트리의 일례를 도시한 도면이다.7 is a diagram illustrating an example of a schema tree generated according to a database table creation procedure according to an embodiment of the present invention.

도 7의 스키마 트리는 도 5에 도시된 최대 경계 스키마와 도 6에 도시된 최소 경계 스키마를 바탕으로 생성되는 것으로서, 스키마 트리는 스트림 데이터의 스키마 구조를 기술한다.The schema tree of FIG. 7 is generated based on the maximum boundary schema shown in FIG. 5 and the minimum boundary schema shown in FIG. 6, and the schema tree describes the schema structure of the stream data.

최대 경계 스키마를 기준으로 최소 경계 스키마를 비교하여 스키마 트리를 생성하는 경우, 동일한 레이블이 중복되거나 아예 없는 부분이 발생하게 된다. 이러한 중복 부분은 스키마 트리에 표시해 주어야 한다.When a schema tree is created by comparing the minimum boundary schema based on the maximum bounding schema, the same label is duplicated or not present at all. These duplicates should be displayed in the schema tree.

예를 들어, <author>는 최소 경계 스키마 상에 최소 1개 이상 존재하고, <comment>는 존재하지 않거나 1회 이상 존재하는 경우, author⁺, comment^*와 같이 “+”, “*” 등의 연산자로 표시할 수 있다.
For example, <author>, if present more than one minimum in the minimum boundary schema, there is not present more than once <comment>, such as "+", "*", such as author ^+, comment ^* of Operator.

도 7과 같이 생성된 스키마 트리를 관계형 테이블과 매핑하기 위해서는 객체-관계형 매핑 기법을 적용하여 관계형 스키마를 생성해야 한다. 하지만, 트리 구조가 직접적으로 관계형 테이블로 매핑되기 어렵기 때문에, 먼저 스키마 트리를 서브 트리로 분해한 후 객체-관계형 매핑 기법을 적용하여 관계형 테이블로 매핑해야 한다.In order to map the generated schema tree to the relational table as shown in FIG. 7, an object-relational mapping technique must be applied to generate a relational schema. However, since the tree structure is not directly mapped to the relational table, the schema tree must first be decomposed into a subtree, and then the object-relational mapping method must be applied to map to the relational table.

도 8은 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 있어서의 분해된 스키마 트리의 일례를 도시한 도면이다.8 is a diagram illustrating an example of an exploded schema tree in the database table creation procedure according to the embodiment of the present invention.

도 8과 같이 스키마 트리를 서브 트리로 분해하는 경우, 스키마 트리를 트리 개게로 인식함으로써 아래의 규칙들을 기반으로 클래스를 테이블로, 속성을 컬럼으로, 클래스 간의 관계를 외래키 관계로 매핑한다.As shown in FIG. 8, when a schema tree is divided into subtrees, the schema tree is recognized as a tree, thereby mapping classes to tables, attributes to columns, and relationships between classes to foreign keys based on the following rules.

스키마 트리를 분해하기 위한 매칭 규칙은 다음과 같다.The matching rules for decomposing the schema tree are as follows.

규칙 1) 객체-관계형 매핑 기법에 의해, 스키마 트리의 리프 노드들은 속성 타입으로, 리프 노드가 아닌 노드들은 클래스 타입으로 매핑한다.Rule 1) By object-relational mapping, leaf nodes in the schema tree map to attribute types, and nodes that are not leaf nodes map to class type.

규칙 2) 부모 노드와 자식 노드의 타입이 모드 클래스 타입이면, 클래스-클래스 관계가 이루어져 관계형 스키마에서 외래키 관계로 매핑한다.Rule 2) If the type of the parent node and the child node is a mode class type, the class-class relation is made and the foreign key relationship is mapped from the relational schema.

규칙 3) 부모 노드와 자식 노드의 타입이 클래스 타입과 속성 타입이면, 관계형 스키마에서 테이블과 컬럼으로 매핑한다.Rule 3) If the type of parent node and child node is class type and attribute type, map to table and column in relational schema.

규칙 4) 단일값 속성(single-valued property)은 클래스 테이블의 컬럼으로 매핑한다.Rule 4) A single-valued property maps to a column in the class table.

규칙 5) 다중값 속성(multi-valued property)은 별개의 테이블로 생성하여 테이블 내 다중 튜플로 매핑한다.Rule 5) A multi-valued property is created as a separate table and mapped to multiple tuples in the table.

규칙 6) 스키마 트리 내 루트 엘리먼트는 제거한다.Rule 6) Remove the root element in the schema tree.

규칙 7) 부모 노드와 자식 노드가 클래스-클래스 타입이면 자식 노드를 제거할 수 있다. 이때, 자식 노드가 제거되면 자식 노드의 클래스 속성들은 부모 노드의 클래스 속성으로 취급된다.Rule 7) If the parent and child nodes are class-class types, child nodes can be removed. At this time, if the child node is removed, the class attributes of the child node are treated as the class attribute of the parent node.

여기서, 규칙 6과 규칙 7은 스키마 트리 내의 계층 구조를 줄여 불필요한 테이블의 생성을 막기 위한 규칙이다. 예를 들어, 스키마 트리 내의 루트 엘리먼트는 XML 문서에는 하나의 루트 엘리먼트가 존재해야 한다는 조건을 만족하기 위해 존재하기 때문에 루트 엘리먼트를 제거해도 스키마 생성에는 아무런 영향을 주지 않는다.
Here, Rule 6 and Rule 7 are rules for reducing unnecessary table generation by reducing the hierarchical structure in the schema tree. For example, removing the root element has no effect on schema creation because the root element in the schema tree exists to satisfy the condition that there must be one root element in the XML document.

도 8에서와 같이 분해된 스키마 트리는 관계형 스키마로 매핑이 가능한데, 이때 서로 간의 매핑 정보를 유지해야 한다. 스키마 트리와 관계형 테이블 사이의 매핑 정보는 XML 데이터를 저장하거나 질의 시 질의에 대한 결과를 추출하는 과정에서 사용될 수 있다.As shown in FIG. 8, the decomposed schema tree can be mapped to a relational schema. At this time, mapping information between the schema trees must be maintained. The mapping information between the schema tree and the relational table can be used in the process of storing the XML data or extracting the result of the query query.

도 9는 본 발명의 실시 예에 따른 데이터베이스 테이블 생성 절차에 따라 생성된 스키마 트리-관계형 테이블 매핑 정보의 일례를 도시한 도면이다.9 is a view showing an example of schema tree-relational table mapping information generated according to the database table creation procedure according to the embodiment of the present invention.

도 3에 도시된 바와 같은 데이터는, 도 9에 도시된 바와 같이, 스키마 트리-관계형 테이블 매핑 정보에 따라 book, author, comment 테이블에 나누어 저장된다.The data as shown in FIG. 3 is divided into book, author, and comment tables according to schema tree-relational table mapping information, as shown in FIG.

book 테이블은 bid, SN, title, year의 컬럼을 가지고, author 테이블 및 comment 테이블과 외래키로 연결되고, author 테이블은 aid, bid, name, age, nation 컬럼으로 구성되고, comment 테이블은 cid, bid, name, text 컬럼으로 구성되며, author 테이블과 comment 테이블의 bid 컬럼은 book 테이블의 bid 컬럼을 참조한다.
The table table consists of the columns aid, bid, name, age, and nation, and the comment table contains cid, bid, and year columns. name, and text columns. The bid column in the author table and the comment table refer to the bid column in the book table.

이상에서 설명한 바와 같이 데이터베이스를 생성한 후에는 실제 스트림 데이터를 데이터베이스에 저장한다. 이때, 사용자는 컴포넌트를 이용하여 스트림 데이터를 저장하는 워크플로우를 구성하고 실행할 수 있다.As described above, after the database is created, the actual stream data is stored in the database. At this time, the user can configure and execute a workflow for storing stream data using a component.

이하에서는 워크플로우를 이용하여 스트림 데이터를 처리하는 과정에 대해서 살펴보기로 한다.
Hereinafter, a process of processing stream data using a workflow will be described.

기존의 일반적인 워크플로우 관리 시스템은 규칙 기반의 관리 시스템(rule-based management system)으로 사용자가 워크플로우를 구성할 때 DAG(Directed Acyclic Graph) 모델을 사용한다. A conventional workflow management system is a rule-based management system that uses a Directed Acyclic Graph (DAG) model when a user constructs a workflow.

또한, 워크플로우를 구성하는 노드는 일반적으로 액션 노드(action node)와 제어 노드(control node)로 구분된다. 액션 노드는 실제 스트림 처리를 담당하는 노드이고, 제어 노드는 워크플로우의 흐름을 제어하는 노드이다.In addition, a node constituting a workflow is generally divided into an action node and a control node. The action node is a node that handles actual stream processing, and the control node is a node that controls the flow of workflow.

이와 같은 노드로 구성된 워크플로우 관리 시스템은 행동 기반의 아키텍쳐(activity-based architecture)를 기반으로 하기 때문에, 하나의 노드가 작업을 완료한 후에 규칙에 따라 다음 노드의 작업을 수행하는 구조로 되어 있다.Since the workflow management system composed of such nodes is based on an activity-based architecture, it is structured such that one node performs work on the next node according to the rules after completing the work.

이와 같은 행동 기반의 아키텍쳐는 빅데이터를 대상으로 하는 배치 처리에는 적합하지만 실시간으로 스트림을 처리하기에는 불가능한 시스템 구조이다.Such a behavior-based architecture is a system structure that is suitable for batch processing targeting big data, but impossible to process the stream in real time.

이에, 본 발명에서는 기존의 워크플로우 관리 시스템을 확장하여, 배치 및 스트림 데이터 처리가 가능한 워크플로우 관리 방법을 제공한다.
Accordingly, the present invention provides a workflow management method capable of expanding an existing workflow management system and performing batch processing and stream data processing.

도 10은 스트림 데이터 처리를 위해 확장된 워크플로우 관리 시스템의 구조를 도시한 도면이다.10 is a diagram showing a structure of an extended workflow management system for stream data processing.

도 10에 도시된 워크플로우 관리 시스템은 도 1의 데이터 처리부(13)에 적용될 수 있는 것으로서, 도 10을 참조하여 워크플로우 관리 시스템(100)을 살펴보면, 클라이언트(110)는 사용자가 워크플로우 컴포넌트를 사용하여 워크플로우를 구성할 수 있도록 GUI 환경을 제공하고, 사용자의 요청에 따라 워크플로우를 구성하는 것으로, 웹 기반 UI 클라이언트일 수 있다.The workflow management system shown in FIG. 10 can be applied to the data processing unit 13 of FIG. 1, and the workflow management system 100 will be described with reference to FIG. It can be a web-based UI client by providing a GUI environment so that a workflow can be configured using it and configuring a workflow according to a user's request.

그리고, 클라이언트(110)는 컴포넌트를 사용하여 워크플로우를 구성하는 면에서 기존의 워크플로우 클라이언트와 동일하지만 스크림 데이터 처리를 위한 컴포넌트들을 추가로 지원한다.The client 110 is the same as the existing workflow client in terms of configuring the workflow using the component, but further supports the components for the processing of the scream data.

WDL(Workflow Description Language) 생성기(130)는 클라이언트(110)에 의해 구성된 워크플로우를 XML 형태로 변환한다. 즉, WDL 생성기(130)는 워크플로우 UI에 해당되는 XML 형태의 WDL을 생성한다.A WDL (Workflow Description Language) generator 130 converts the workflow configured by the client 110 into an XML format. That is, the WDL generator 130 generates an XML-type WDL corresponding to the workflow UI.

기존의 시스템에서는 순차적으로 워크플로우 노드가 실행되도록 기술된 것에 반하여, 본 발명에서는 모든 워크플로우 노드가 실행되어 실시간으로 스트림 데이터를 처리할 수 있도록 XML 형태로 기술되는 것이다.In the existing system, a workflow node is described to be executed sequentially, whereas in the present invention, all workflow nodes are executed and described in XML form so that they can process stream data in real time.

이때, 상기 WDL 생성기(130)에 의해 기술되는 WDL에서의 각 노드는 스트림 노드이며, 스트림이 입력되면 어떤 액션(action)을 수행해야 하는지가 기술되어 있다.In this case, each node in the WDL described by the WDL generator 130 is a stream node, and describes an action to be performed when a stream is input.

워크플로우 실행 엔진(150)은 스트림이 입력되면, WDL 생성기(130)에 의해 생성된 WDL에 기술되어 있는 action을 수행한다.
The workflow execution engine 150 performs an action described in the WDL generated by the WDL generator 130 when a stream is input.

도 11은 도 10의 워크플로우 관리 시스템에서의 실행을 위한 스트림 노드의 구조를 도시한 도면이다.11 is a diagram showing a structure of a stream node for execution in the workflow management system of FIG.

도 11을 참조하면, 하나의 스트림 노드(1100)는 실시간으로 스트림을 입력받고 출력하기 위해 2개의 에이전트(1110, 1130)를 포함한다.Referring to FIG. 11, one stream node 1100 includes two agents 1110 and 1130 for inputting and outputting streams in real time.

소스 에이전트(source agent, 1110)는 스트림 데이터를 수집하는 에이전트로, 스트림 소스 노드로부터 전송받은 스트림을 컴포넌트에 전달하는 역할을 하고, 타겟 에이전트(target agent, 1130)는 처리된 스트림을 출력하는 역할을 수행한다.A source agent 1110 is an agent that collects stream data. The source agent 1110 transmits a stream received from a stream source node to a component. A target agent 1130 outputs a processed stream. .

따라서, 스트림 노드는 2개의 에이전트를 통해 실시간 스트림 처리가 가능해지고, 노드 실행 시, 1개의 스트림 처리 프로세스와 2개의 에이전트 프로세스가 실행된다.
Therefore, the stream node becomes capable of real-time stream processing through two agents, and when executing the node, one stream processing process and two agent processes are executed.

도 12는 도 10의 워크플로우 관리 시스템에서의 실행을 위한 스트림 노드의 타입을 도시한 도면이다.Fig. 12 is a diagram showing types of stream nodes for execution in the workflow management system of Fig. 10; Fig.

도 12(a)에 도시된 노드는 이벤트 노드(event node)로서, 입력되는 스트림을 처리하며, 입력되는 모든 스트림을 대상으로 스트림 처리를 수행하는 노드이다.The node shown in FIG. 12 (a) is an event node, which processes input streams and performs stream processing on all input streams.

도 12(b)에 도시된 노드는 복합 이벤트 노드(complex event node)로서, 모든 스트림을 처리 대상으로 하는 노드가 아니고, 스트림 내에 특정 이벤트가 포함된 스트림만을 대상으로 스트림 처리를 수행하는 노드이다.The node shown in FIG. 12B is a complex event node, which is not a node to process all streams but is a node that performs stream processing only on a stream including a specific event in the stream.

도 12(c)에 도시된 노드는 소스 노드(source node)로서, 로그 데이터와 같이 서버에 저장된 정적 데이터를 스트림으로 변환해 주는 것으로서, 기존의 데이터들을 스트림으로 처리하기 위해서는 소스 노드를 이용하여 스트리 처리를 할 수 있다.The node shown in FIG. 12C is a source node, which converts static data stored in a server into a stream, such as log data. In order to process existing data as a stream, Processing can be performed.

도 12(d)에 도시된 노드는 타겟 노드(target node)로서, 소스 노드와는 반대되는 기능을 수행하는 것으로, 스트림 처리를 수행한 후 결과를 정적 데이터로 출력한다. 이때, 타겟 노드는 입력 스트림의 크기에 상관없이 한 번의 정적 데이터를 생성한다.
The node shown in FIG. 12 (d) is a target node, which performs a function opposite to that of the source node, and performs stream processing and outputs the result as static data. At this time, the target node generates one static data regardless of the size of the input stream.

도 13은 본 발명의 실시 예에 따른 워크플로우 관리 시스템의 스트림 노드의 WDL 워크플로우 시맨틱의 일례를 도시한 도면이다.13 is a diagram showing an example of a WDL workflow semantic of a stream node in a workflow management system according to an embodiment of the present invention.

도 13을 참조하면, 스트림 노드는 입력 스트름에 대해 <invoke>에 의해 호출되는 액션(action)을 수행하고, 출력 스트림을 생성한다.Referring to FIG. 13, a stream node performs an action called by < invoke > on an input stream, and generates an output stream.

이상에서 살펴본 바와 같이 WDL 생성기(130)를 확장하면 워크플로우 실행 엔진(150)은 노드 타입에 따라 적합한 실행 언어(execution language)로 변환하여 해당 노드의 작업을 실행한다.
As described above, when the WDL generator 130 is extended, the workflow execution engine 150 converts the execution language into an execution language suitable for the node type, and executes the operation of the corresponding node.

도 14는 본 발명의 실시 예에 따른 워크플로우 관리 시스템의 스트림 처리 과정의 일례를 도시한 도면이다.FIG. 14 is a diagram showing an example of a stream processing process of the workflow management system according to the embodiment of the present invention.

도 14에서의 스트림 처리 과정은 도 10에 도시된 워크플로우 관리 시스템(100)에 의해 수행되는 것으로서, 사용자의 지시에 따라 클라이언트(110)가 워크플로우를 생성하면(S1400), WDL 생성기(130)는 워크플로우 UI에 해당되는 XML 형태의 WDL을 생성한다(S1410).14 is performed by the workflow management system 100 shown in FIG. 10, and when the client 110 generates a workflow according to a user's instruction (S1400), the WDL generator 130 generates a workflow Generates a WDL in XML format corresponding to the workflow UI (S1410).

이후, 워크플로우 실행 엔진(150)이 워크플로우를 실행하면(S1420), 모든 스트림 노드에 해당되는 프로세스와 각 노드가 가지고 있는 2개의 에이전트 프로세스가 실행된다.Thereafter, when the workflow execution engine 150 executes the workflow (S1420), the processes corresponding to all the stream nodes and the two agent processes each node has are executed.

도 14의 경우에는, 스트림 처리 워크플로우가 3개의 스트림 노드로 구성되어 있기 때문에, 3개의 액션 프로세스와 6개의 에이전트 프로세서가 실행되어, 총 9개의 프로세스가 실행된다.In the case of Fig. 14, since the stream processing workflow is composed of three stream nodes, three action processes and six agent processors are executed, and a total of nine processes are executed.

이와 같이, 확장된 워크플로우 관리 시스템은 배치 처리뿐만 아니라 동시에 스트림 처리가 가능하기 때문에, 사용자들은 손쉽게 워크플로우를 이용하여 스트림을 처리할 수 있게 된다.
Thus, since the extended workflow management system can perform stream processing as well as batch processing, users can easily process the stream using the workflow.

한편, 본 발명에 따른 정형 스트림 데이터 처리장치 및 처리방법을 실시 예에 따라 설명하였지만, 본 발명의 범위는 특정 실시 예에 한정되는 것은 아니며, 본 발명과 관련하여 통상의 지식을 가진 자에게 자명한 범위 내에서 여러 가지의 대안, 수정 및 변경하여 실시할 수 있다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. And various alternatives, modifications, and alterations can be made within the scope.

따라서, 본 발명에 기재된 실시 예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.
Therefore, the embodiments described in the present invention and the accompanying drawings are intended to illustrate rather than limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and accompanying drawings . The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the present invention.

10 : 정형 스트림 데이터 처리장치
11 : 테이블 생성부
13 : 데이터 처리부10: Fixed stream data processing device
11:
13:

Claims

A table generation unit for extracting a schema for stream data in an XML format and generating a database table based on the extracted schema; And
And a data processing unit for collecting stream data, performing preprocessing on the collected stream data, and storing the preprocessed stream data in the database table
A device for processing a fixed stream data.