KR20080096004A

KR20080096004A - Data storage and inquiry method for time series analysis of weblog and system for executing the method

Info

Publication number: KR20080096004A
Application number: KR1020070040782A
Authority: KR
Inventors: 김동욱; 박한돌; 정주원; 이윤식
Original assignee: 엔에이치엔(주)
Priority date: 2007-04-26
Filing date: 2007-04-26
Publication date: 2008-10-30
Also published as: JP5535062B2; JP2010525477A; WO2008133396A1; KR100898465B1

Abstract

A data storage and an inquiry method for time series analysis of weblog are provided to classify the weblog to field name, floating field, floating field tuple, and floating field relation data, thereby performing weblog analysis easily, and efficiently. A floating field relation data is generated based on weblog and the generation time of weblog in the preprocess phase(S310). The floating field relation data is processed by the data operator inputted through a user's terminal(S320). The preprocess phase comprises followings. A data is extracted from the weblog by weblog parsing(S311). The data is classified by the user login identifier contained in the weblog(S312). The floating field relation data is generated by arranging the data by the generation time order about the same user login identifier(S313).

Description

DATA STORAGE AND INQUIRY METHOD FOR TIME SERIES ANALYSIS OF WEBLOG AND SYSTEM FOR EXECUTING THE METHOD}

도 1은 종래기술에 있어서, 관계형 데이터 모델을 기반으로 하는 관계형 데이터베이스의 문제점을 설명하기 위한 일례이다.1 is an example for explaining a problem of a relational database based on a relational data model in the related art.

도 2는 본 발명의 제1 실시예에 있어서, 데이터 저장 및 조회 시스템의 개괄적인 모습을 도시한 일례이다.2 is an example showing an overview of a data storage and inquiry system according to the first embodiment of the present invention.

도 3은 본 발명의 제1 실시예에 있어서, 웹로그 기반의 데이터 저장 및 조회 방법을 도시한 흐름도이다.3 is a flowchart illustrating a weblog-based data storage and retrieval method according to a first embodiment of the present invention.

도 4는 본 발명에 따른, 조인 연산을 설명하기 위한 일례이다.4 is an example for explaining a join operation according to the present invention.

도 5는 본 발명에 따른, 스플릿 연산을 설명하기 위한 일례이다.5 is an example for explaining a split operation according to the present invention.

도 6은 본 발명의 제2 실시예에 있어서, 데이터 저장 및 조회 방법을 도시한 흐름도이다.6 is a flowchart illustrating a data storage and inquiry method according to a second embodiment of the present invention.

도 7은 본 발명의 제3 실시예에 있어서, 데이터 저장 및 조회 시스템의 내부 구성을 설명하기 위한 블록도이다.7 is a block diagram for explaining an internal configuration of a data storage and inquiry system according to a third embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

700: 데이터 저장 및 조회 시스템700: data storage and inquiry system

701: 유동 필드 관계 데이터 생성부701: flow field relationship data generation unit

702: 유동 필드 관계 데이터베이스702: Floating Field Relationship Database

703: 데이터 연산자 처리부703: Data Operator Processing Unit

본 발명은 웹로그(weblog)의 시계열 분석(time series analysis)을 위한 데이터 저장 및 조회 방법 그리고 상기 방법을 수행하는 시스템에 관한 것으로, 더욱 자세하게는 데이터 특히, 웹로그에 있어서 상기 웹로그의 시계열 분석에 적합한 데이터 저장 및 조회 방법 및 시스템에 관한 것이다.The present invention relates to a data storage and retrieval method for time series analysis of a weblog and a system for performing the method. More particularly, the time series analysis of the weblog in data, in particular in a weblog The present invention relates to a data storage and retrieval method and system suitable for use.

현재 상용 데이터베이스 제품의 대다수는 관계형 데이터 모델(Relational Data Model)을 채용하고 있다. 기존의 데이터 모델로 이용되던 계층형 데이터 모델, 네트워크형 데이터 모델이 있었지만, 관계형 데이터 모델의 구조가 기존 데이터 모델에 비해 더 유연하여 실세계를 좀 더 현실감 있게 반영할 수 있었기 때문에 많은 데이터베이스 시스템에 구현되었으며, 이로 인하여 관계형 데이터 모델을 지원하는 관계형 데이터베이스 관리 시스템(RDBMS) 제품들이 데이터베이스 시장을 지배하게 되었다.Many of the current commercial database products employ a relational data model. Although there was a hierarchical data model and a network data model used as the existing data model, the structure of the relational data model was more flexible than the existing data model, so it was implemented in many database systems because it could reflect the real world more realistically. As a result, relational database management system (RDBMS) products that support relational data models dominate the database market.

이러한 관계형 데이터 모델은 기본적으로 다음과 같이 핵심적인 3개의 구성요로 구성되어 있으며, 실세계의 모든 업무체계를 아래의 3가지로 모두 표현할 수 있다는 개념이다.This relational data model is basically composed of three core components as follows, and it is the concept that all the work systems of the real world can be expressed in all three below.

1. 개체(Entity): 시스템화하고자 하는 사건, 사물.1.Entity: An event or thing to be systemized.

2. 관계(Relationship): 개체간, 속성간의 연관성.2. Relationship: Association between objects and properties.

3. 속성(Attribute): 개체, 관계성의 성질을 나타내는 더 이상 쪼갤 수 없는 정보의 단위.3. Attribute: A unit of information that can no longer be broken down to indicate the nature of the entity or relationship.

그러나, 이와 같이 관계형 데이터 모델을 기반으로 개발된 관계형 데이터베이스는 필드(field)의 개수가 미리 고정된다. 따라서, 상황에 따라 표현해야 할 정보가 추가적으로 발생하는 경우, 관계형 데이터베이스에서는 이를 효과적으로 표현할 수 없다. 그렇게 때문에, 이러한 문제점을 해결하기 위해 관계형 데이터베이스에서는 여분의 필드를 미리 만들어 두고, 임시로 사용하는 등의 방법을 사용하기도 한다.However, in the relational database developed based on the relational data model, the number of fields is fixed in advance. Therefore, if additional information to be expressed according to a situation occurs, the relational database cannot effectively express it. Thus, to solve this problem, relational databases sometimes use extra methods, such as creating a temporary field and using it temporarily.

또한, 관계형 데이터베이스는 하나의 필드에 여러 개의 값이 반복되는 것을 표현하지 못한다. 예를 들어, 한 고객이 구매한 상품 목록과 같은 것은 관계형 데이터 모델에서는 직접 표현하지 못하기 때문에, 별도의 테이블로 구분한 후 조인 등의 연산을 이용하여 추후에 연결해서 정보를 사용할 수 있게 하는 것이 일반적이며, 관계형 데이터 모델에서 추천하는 방식이기도 하다.Also, relational databases do not represent the repetition of multiple values in a field. For example, a list of products purchased by a customer cannot be represented directly in the relational data model. Therefore, it is necessary to divide the information into separate tables and use the joins to make the information available later. It is common and recommended by relational data models.

관계형 데이터 모델은 도면부호(110)에 도시된 바와 같이 연관된 레코드마다 바인딩된 값에만 의존한다. 즉, 나열된 레코드들간의 순서에는 아무런 의미를 갖지 않는다. 즉, 동일한 사람 'Park'(111)에 대응하는 값 'b'(112) 와 'c'(113)에 대해 시계열 분석이 불가능하다. 미리 정의되고 고정된 속성의 순서를 이용하기 때문에 속성간의 관계를 효과적으로 표현하는 인덱싱 메커니즘이 존재하지 않고 따라서, 도면부호(120)와 같이 전체 레코드를 재구성하는 과정이 필요해진다.The relational data model depends only on the value bound per associated record, as shown at 110. In other words, the order between the listed records has no meaning. That is, time series analysis is not possible for the values 'b' 112 and 'c' 113 corresponding to the same person 'Park' 111. Since there is no indexing mechanism that effectively expresses the relationship between attributes because of the use of predefined and fixed order of attributes, there is a need for a process of reconstructing the entire record as shown by reference numeral 120.

이와 같이 관계형 데이터 모델을 이용한 개발 방법은 웹로그의 시계열 분석 등에서는 그 분석 과정을 어렵게 만드는 원인이 된다. 한 사람의 행동 패턴을 기록하려고 함에 있어, 관계형 데이터 모델에서의 한계 때문에 한 사람의 행동 패턴을 서로 다른 테이블, 서로 다른 레코드에 따로 기록을 해야 한다.The development method using the relational data model like this makes the analysis process difficult in time series analysis of the weblog. In attempting to record a person's behavioral patterns, limitations in the relational data model require that a person's behavioral patterns be recorded separately in different tables and in different records.

따라서, 따로 기록된 행동 패턴간의 관련성을 추적하려면, 연산 비용이 매우 비싼 조인 등과 같은 연산을 사용해야 하고, SQL(Structured Query Language)을 기술하기도 매우 어려워지며, 작성된 SQL이라 하더라도 상기 SQL을 처리하기에 매우 복잡한 구조를 가지게 되는 문제점이 있다.Therefore, in order to track the relationship between behavior patterns recorded separately, it is necessary to use operations such as joins, which are very expensive, and it is very difficult to describe Structured Query Language (SQL), and even written SQL is very difficult to process the SQL. There is a problem of having a complicated structure.

본 발명은 상기와 같은 종래기술의 문제점을 해결하기 위해, 웹로그(weblog)의 시계열 분석(time series analysis)을 위한 데이터 저장 및 조회 방법 그리고 상기 방법을 수행하는 시스템에 관한 새로운 기술을 제안한다.The present invention proposes a new method for a data storage and retrieval method for time series analysis of weblogs and a system for performing the method to solve the above problems of the prior art.

본 발명은 상기 웹로그를 필드 이름 및 필드 값 쌍의 집합인 유동 필드(floating field), 상기 유동 필드의 시계열적 나열인 유동 필드 튜플(floating field tuple), 상기 유동 필드 튜플의 집합인 유동 필드 관계 데이터(floating field relation data)로 구성하여 저장 및 조회함으로써 상기 웹로그의 시계열 분석을 쉽고 간편하게 수행하는 것을 목적으로 한다.The present invention relates to a floating field that is a collection of field names and field value pairs, a floating field tuple that is a time series listing of the flow field, and a flow field relationship that is a collection of floating field tuples. An object of the present invention is to easily and simply perform time series analysis of the weblog by configuring, storing, and inquiring into floating field relation data.

본 발명의 다른 목적은 상기 웹로그 뿐만 아니라 시계열 분석을 요구하는 모든 데이터에 대해 상기 유동 필드 관계 데이터를 생성하여 저장 및 조회를 수행할 수 있는 데이터 모델을 제공하는 것이다.It is another object of the present invention to provide a data model capable of generating, storing, and retrieving the flow field relationship data for all data requiring time series analysis as well as the weblog.

상기의 목적을 달성하고, 상술한 종래기술의 문제점을 해결하기 위하여, 본 발명의 일실시예에 따른 웹로그(weblog) 기반의 데이터 저장 및 조회 방법은, 웹로그 및 상기 웹로그의 발생시간에 기초하여 유동 필드 관계 데이터(floating field relation data)를 생성 및 유지하는 전처리 단계 및 사용자단말기를 통해 입력된 데이터 연산자(data operator)에 따라 상기 유동 필드 관계 데이터를 처리하는 단계를 포함한다.In order to achieve the above object and to solve the above problems of the prior art, weblog (weblog) based data storage and retrieval method according to an embodiment of the present invention, the weblog and the generation time of the weblog And a pre-processing step of generating and maintaining floating field relation data based on the data, and processing the floating field relation data according to a data operator input through a user terminal.

본 발명의 일측에 따르면, 상기 전처리 단계는, 웹로그를 파싱(parsing)하여 상기 웹로그로부터 데이터를 추출하는 단계, 상기 데이터를 상기 웹로그가 포함하는 사용자 로그인 식별자에 따라 분류하는 단계 및 동일한 사용자 로그인 식별자에 대해 상기 데이터를 상기 발생시간 순서로 정렬하여 상기 유동 필드 관계 데이터를 생성하는 단계를 포함할 수 있다.According to one aspect of the invention, the pre-processing step, parsing a weblog (parsing) to extract data from the weblog, classifying the data according to the user login identifier that the weblog and the same user And generating the floating field relation data by sorting the data in the order of occurrence time with respect to a login identifier.

본 발명의 다른 측면에 따르면, 상기 유동 필드 관계 데이터는 적어도 하나의 유동 필드 튜플(floating field tuple)을 포함할 수 있고, 상기 유동 필드 튜플은 필드이름 및 필드값의 쌍에 대한 집합인 유동 필드가 상기 발생시간 순서로 정렬될 수 있다. 이때, 상기 필드이름은 사용자 로그인 식별자에 대응하는 사용자의 동작 또는 상태를 정의할 수 있고, 상기 필드값은 상기 동작 또는 상기 상태에 대 응하는 실제값을 포함할 수 있다.According to another aspect of the invention, the flow field relationship data may comprise at least one floating field tuple, wherein the flow field tuple is a flow field that is a collection of pairs of field names and field values. The occurrence time may be arranged in order. In this case, the field name may define an operation or state of a user corresponding to a user login identifier, and the field value may include an actual value corresponding to the operation or state.

본 발명의 또 다른 측면에 따르면, 상기 데이터 연산자는 조인(join) 연산자, 스플릿(split) 연산자 및 선택 및 프로젝트(select-and-project) 연산자 중 적어도 하나의 연산자를 포함할 수 있고, 사용자단말기를 통해 입력된 데이터 연산자에 따라 상기 유동 필드 관계 데이터를 처리하는 상기 단계는, (1) 상기 조인 연산자에 따라 상기 유동 필드 관계 데이터가 포함하는 유동 필드 튜플을 결합하거나, (2) 상기 스플릿 연산자에 따라 유동 필드 튜플을 복수의 유동 필드 튜플로 분리하거나 또는 (3)상기 선택 및 프로젝트 연산자에 따라 상기 유동 필드 관계 데이터에서 값을 추출하여 상기 사용자단말기로 제공할 수 있다.According to another aspect of the present invention, the data operator may include at least one of a join operator, a split operator, and a select-and-project operator. The step of processing the flow field relationship data according to the data operator input through (1) combining the flow field tuples included in the flow field relationship data according to the join operator, or (2) in accordance with the split operator The flow field tuple may be divided into a plurality of flow field tuples, or (3) a value may be extracted from the flow field relation data according to the selection and project operator and provided to the user terminal.

본 발명의 다른 실시예에 있어서, 데이터 저장 및 조회 방법은, 데이터를 식별자별로 분류하고, 동일한 식별자에 대해 상기 데이터를 상기 데이터의 발생시간 순서로 정렬하여 유동 필드 관계 데이터를 생성하는 전처리 단계를 포함한다.In another embodiment of the present invention, the data storage and retrieval method includes a pre-processing step of classifying data by identifier and generating flow field relationship data by sorting the data in order of occurrence time of the data for the same identifier. do.

이하 첨부된 도면을 참조하여 본 발명에 따른 다양한 실시예를 상세히 설명하기로 한다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

로그 수집부(201)는 각각의 웹서버들로부터 전달된 웹로그를 수신하고, 전처리부(202)는 상기 웹로그를 통합하여 상기 웹로그로부터 데이터를 추출(parsing)한다. 이때, 전처리부(202)는 상기 데이터를 통해 적어도 하나의 유동 필드 튜플의 집합인 유동 필드 관계 데이터를 생성할 수 있다.The log collecting unit 201 receives the web logs transmitted from the respective web servers, and the preprocessing unit 202 integrates the web logs to extract data from the web logs. In this case, the preprocessor 202 may generate flow field relationship data that is a set of at least one flow field tuple through the data.

이러한, 도 2의 일례에서는 상기 유동 필드 관계 데이터가 분산 시스템(203)에 보관되는 경우를 도시하고 있다. 즉, 상기 유동 필드 관계 데이터는 이러한 분산 시스템(203)의 유동 필드 관계 데이터베이스에 저장될 수 있다.In this example of FIG. 2, the flow field relationship data is stored in the distribution system 203. That is, the flow field relationship data may be stored in the flow field relationship database of this distributed system 203.

이와 같이, 저장된 상기 유동 필드 관계 데이터는 사용자단말기로부터 입력된 조인 연산자, 스플릿 연산자, 선택 및 프로젝트 연산자 등의 데이터 연산자(204)를 통해 조회되고, 상기 조회의 결과는 가공 및 가시화되어(205) 상기 사용자단말기로 제공될 수 있다.As such, the stored flow field relationship data is inquired through data operators 204 such as a join operator, a split operator, a selection and a project operator input from a user terminal, and the result of the inquiry is processed and visualized (205). It may be provided as a user terminal.

단계(S310)에서 웹로그 기반의 데이터 저장 및 조회를 수행하는 데이터 저장 및 조회 시스템은 웹로그 및 상기 웹로그의 발생시간에 기초하여 유동 필드 관계 데이터를 생성 및 유지한다. 이때, 상기 유동 필드 관계 데이터는 적어도 하나의 유동 필드 튜플을 포함할 수 있고, 상기 유동 필드 튜플은 필드이름 및 필드값의 쌍에 대한 집합인 유동 필드가 상기 발생시간 순서로 정렬될 수 있다. 또한, 상기 필드이름은 사용자 로그인 식별자에 대응하는 사용자의 동작 또는 상태를 정의할 수 있고, 상기 필드값은 상기 동작 또는 상기 상태에 대응하는 실제값을 포함할 수 있다.The data storage and retrieval system for performing weblog-based data storage and retrieval at step S310 generates and maintains flow field relationship data based on the weblog and the occurrence time of the weblog. In this case, the flow field relationship data may include at least one flow field tuple, and the flow field tuple may be a flow field, which is a set of pairs of field names and field values, in the order of occurrence time. In addition, the field name may define an operation or state of a user corresponding to a user login identifier, and the field value may include an actual value corresponding to the operation or state.

예를 들어, 상기 필드이름으로서 'id'를, 상기 필드이름에 대응하는 필드값으로서 'Kim'을 갖는 경우, 상기 데이터 저장 및 조회 시스템은 임의의 사용자의 사용자 로그인 식별자가 'Kim'임을 알 수 있다. 이러한 상기 필드이름 및 상기 필 드값의 쌍은 상기 유동 필드로서 <id=Kim>과 같이 표기될 수 있다. 상기 유동 필드의 또 다른 예로서 상기 <id=Kim>에 추가적으로 또 다른 유동 필드를 갖는 경우, 다시 말해 상기 필드이름으로서 상기 사용자가 방문한 웹페이지를 의미하는 'node'와 상기 필드값으로서 상기 웹페이지의 실제값을 의미하는 '홈'을 갖는 경우, 상기 데이터 저장 및 조회 시스템은 상기 사용자 로그인 식별자 'Kim'이 '홈'을 방문했음을 알 수 있다.For example, in the case of having 'id' as the field name and 'Kim' as a field value corresponding to the field name, the data storage and inquiry system may know that a user login identifier of any user is 'Kim'. have. The pair of the field name and the field value may be represented as <id = Kim> as the floating field. As another example of the floating field, in the case of having another floating field in addition to the <id = Kim>, that is, 'node' indicating the web page visited by the user as the field name and the web page as the field value. In the case of having a 'home' meaning an actual value of, the data storage and inquiry system may know that the user login identifier 'Kim' has visited 'home'.

이와 같이 상기 유동 필드들은 동일한 사용자 로그인 식별자에 대해 상기 발생시간 순서로 정렬될 수 있고, 상기 정렬된 유동 필드들의 시퀀스는 상기 유동 필드 튜플로서 정의될 수 있다. 즉, 상기 유동 필드 튜플은 동일한 사용자 로그인 식별자를 갖는 사용자의 상태 또는 동작에 대한 데이터를 상기 상태 또는 상기 동작의 발생시간 순서로 포함할 수 있다.As such, the floating fields can be sorted in the order of occurrence time for the same user login identifier, and the sequence of sorted floating fields can be defined as the floating field tuple. That is, the floating field tuple may include data about a state or an operation of a user having the same user login identifier in the order of occurrence of the state or the action.

다시 말해, 이러한 상기 유동 필드 튜플을 통해 상기 유동 필드 튜플의 집합인 상기 유동 필드 관계 데이터는 상기 웹로그가 포함하는 모든 사용자들 각각의 시간에 따른 동작 및 상태에 대한 데이터를 의미할 수 있고, 이를 통해 상기 웹로그의 시계열 분석이 가능해진다.In other words, the flow field relationship data, which is a set of the flow field tuples through the flow field tuples, may mean data on the operation and status of each user included in the weblog over time. This enables time series analysis of the weblog.

예를 들어, <id=Kim><node=메인홈><node=메일><node=메인홈><node=게임A>와 같은 유동 필드 튜플을 이용하면 상기 데이터 저장 및 조회 시스템에서 상기 사용자 로그인 식별자 'Kim'을 사용하는 사용자가 '메인홈'을 통해 '메일' 웹페이지로 접근하였고, 다시 '메인홈'을 통해 '게임A'에 대한 웹페이지로 접근하였음을 확인할 수 있다. 즉, '메인홈'을 통해 '메일'을 확인한 후 다시 '메인홈'을 통해 '게 임A'로 접근한 사용자를 조회하는 것이 가능해진다.For example, using a floating field tuple such as <id = Kim> <node = Main Home> <node = Mail> <node = Main Home> <node = Game A> allows the user to log in the data storage and retrieval system. It can be seen that the user using the identifier 'Kim' accesses the 'mail' web page through 'main home' and accesses the web page for 'game A' through 'main home'. In other words, after checking the 'mail' through the 'main home' it is possible to query the user accessing the 'game A' through the 'main home' again.

이러한, 상기 유동 필드 관계 데이터를 생성 및 유지하기 위해 상기 데이터 저장 및 조회 시스템은 단계(S310)에 도 3에 도시된 바와 같이 단계(S311) 내지 단계(S313)을 포함하여 수행할 수 있다.In order to generate and maintain the flow field relationship data, the data storage and retrieval system may include steps S311 to S313 as shown in FIG. 3 in step S310.

단계(S311)에서 상기 데이터 저장 및 조회 시스템은 웹로그를 파싱하여 상기 웹로그로부터 데이터를 추출한다. 다시 말해, 상기 데이터 저장 및 조회 시스템은 상기 웹로그로부터 상기 유동 필드 관계 데이터를 생성하기 위해 정형화된 정보 즉, 상기 데이터를 추출한다.In step S311, the data storage and retrieval system parses a weblog to extract data from the weblog. In other words, the data storage and retrieval system extracts structured information, ie, the data, to generate the flow field relationship data from the weblog.

단계(S312)에서 상기 데이터 저장 및 조회 시스템은 상기 데이터를 상기 웹로그가 포함하는 사용자 로그인 식별자에 따라 분류한다. 일반적으로 상기 웹로그에는 여러 사람에 의한 방문 로그가 시간순으로 적재된다. 또한, 통상적으로 복수의 웹서버를 통해 상기 방문 로그가 적재되기 때문에 동일한 사용자에 대한 기록이 서로 다른 웹서버에 분산 저장될 수 있다. 따라서, 이렇게 전체 웹서버에 분산 저장된 웹로그를 모두 합친 후에 각각의 사용자에 해당하는 상기 데이터를 모아 상기 사용자 로그인 식별자에 따라 분류하는 과정이 필요하다.In step S312, the data storage and inquiry system classifies the data according to the user login identifier included in the weblog. In general, the visit log by several people is loaded in chronological order. Also, since the visit log is typically loaded through a plurality of web servers, records for the same user may be distributed and stored in different web servers. Therefore, after all the web logs distributed and stored in the entire web server is combined, it is necessary to collect the data corresponding to each user and classify them according to the user login identifier.

단계(S313)에서 상기 데이터 저장 및 조회 시스템은 동일한 사용자 로그인 식별자에 대해 상기 데이터를 상기 발생시간 순서로 정렬하여 상기 유동 필드 관계 데이터를 생성한다. 즉, 상기 데이터 저장 및 조회 시스템은 상기 사용자 로그인 식별자에 따라 분류된 상기 데이터를 상기 사용자 로그인 식별자에 따라 상기 발생시간 순서로 정렬함으로써 상기 유동 필드 관계 데이터를 생성할 수 있다. 이때, 상기 데이터 각각은 위에서 설명한 유동 필드 각각에 해당할 수 있다.In step S313, the data storage and inquiry system sorts the data in the order of occurrence time for the same user login identifier to generate the floating field relationship data. That is, the data storage and inquiry system may generate the floating field relationship data by sorting the data classified according to the user login identifier in the order of occurrence time according to the user login identifier. In this case, each of the data may correspond to each of the flow fields described above.

단계(S320)에서 상기 데이터 저장 및 조회 시스템은 사용자단말기를 통해 입력된 데이터 연산자에 따라 상기 유동 필드 관계 데이터를 처리한다. 이때, 상기 데이터 연산자는 조인(join) 연산자, 스플릿(split) 연산자 및 선택 및 프로젝트(select-and-project) 연산자 중 적어도 하나의 연산자를 포함할 수 있다In step S320, the data storage and retrieval system processes the floating field relationship data according to a data operator input through a user terminal. In this case, the data operator may include at least one of a join operator, a split operator, and a select-and-project operator.

이때, 상기 데이터 저장 및 조회 시스템은 상기 데이터 연산자에 따라 상기 유동 필드 관계 데이터를 처리하기 위해 (1) 상기 조인 연산자에 따라 상기 유동 필드 관계 데이터가 포함하는 유동 필드 튜플을 결합하거나, (2) 상기 스플릿 연산자에 따라 유동 필드 튜플을 복수의 유동 필드 튜플로 분리하거나 또는 (3)상기 선택 및 프로젝트 연산자에 따라 상기 유동 필드 관계 데이터에서 값을 추출하여 상기 사용자단말기로 제공할 수 있다. 여기서, 상기 값은 복수의 유동 필드가 각각 포함하는 실제값들의 집합을 의미할 수 있다.In this case, the data storage and retrieval system combines (1) the flow field tuples included in the flow field relationship data according to the join operator to process the flow field relationship data according to the data operator, or (2) the According to the split operator, the flow field tuple may be divided into a plurality of flow field tuples, or (3) a value may be extracted from the flow field relation data according to the selection and project operator and provided to the user terminal. Here, the value may mean a set of actual values included in each of the plurality of flow fields.

또한, 상기 데이터 연산자는 이러한 상기 조인 연산자, 상기 스플릿 연산자 및 상기 선택 및 프로젝트 연산자 중 복수개의 연산자를 포함하는 것도 가능하다. 즉, 하나의 유동 필드 튜플을 복수개의 유동 필드 튜플로 분리한 후 값을 추출하거나 또는 복수개의 유동 필드 튜플을 하나의 유동 필드 튜플로 결합한 후 값을 추출하는 것 또한 가능하다.In addition, the data operator may include a plurality of operators among the join operator, the split operator, and the selection and project operators. That is, it is also possible to extract a value after separating one flow field tuple into a plurality of flow field tuples or to extract a value after combining a plurality of flow field tuples into one flow field tuple.

위에서 설명한 바와 같이 상기 유동 필드 튜플은 하나의 주체 즉, 한 명의 사용자에 해당하는 일련의 정보인 상기 유동 필드의 나열이다. 이러한 상기 유동 필드 튜플을 분석하고자 할 때, 한 사용자의 1 주일간의 행동 패턴을 분석하거나 한달 이상의 기간에 대해서 분석하고자 하는 경우가 있을 수 있다. 이러한 경우, 상기 사용자에 대해 한달 이상의 기간에 대해서 상기 유동 필드를 모아 상기 유동 필드 튜플로 생성해 두어야 하는데, 보관상 또는 기술적인 이유로 이러한 전체 상기 유동 필드를 하나의 유동 필드 튜플로 생성해 두는 것보다 임의의 기간에 대한 상기 유동 필드 튜플을 동적으로 생성하는 것이 편리하다.As described above, the floating field tuple is a sequence of the floating fields, which is a set of information corresponding to one subject, ie one user. When analyzing the flow field tuple, there may be a case where one user wants to analyze the behavioral pattern for one week or for more than one month. In this case, the flow field must be collected and created as a flow field tuple for a period of one month or more for the user, rather than for the sake of storage or technical reasons, creating the entire flow field as a single flow field tuple. It is convenient to dynamically generate the floating field tuples for any period.

즉, 한 명의 사용자에 대해 짧은 시간 단위로 복수개의 유동 필드 튜플을 생성하고, 필요에 따라 상기 조인 연산자를 이용하여 상기 유동 필드 튜플을 결합함으로써 원하는 시간 단위 동안의 상기 사용자의 행동 패턴에 대한 유동 필드 튜플을 얻을 수 있다. 예를 들어, 상기 유동 필드 튜플을 일단위로 생성하고, 1월 2일에 해당하는 유동 필드 튜플과 1월 3일에 해당하는 유동 필드 튜플을 상기 조인 연산자를 통해 결합함으로써, 1월 2일부터 1월 3일까지의 기간에 해당하는 유동 필드 튜플을 생성할 수 있다.That is, the flow field for the behavior pattern of the user during the desired time unit by generating a plurality of flow field tuples for one user in a short time unit, and combining the flow field tuples using the join operator as needed You can get a tuple. For example, by generating the flow field tuple on a daily basis, by combining the flow field tuple corresponding to January 2 and the flow field tuple corresponding to January 3 through the join operator, from January 2 to 1 It is possible to create a floating field tuple for a period up to 3 days per month.

상기 데이터 저장 및 조회 시스템은 상기 조인 연산자가 사용자단말기를 통해 입력된 경우, 상기 조인 연산자가 포함하는 조건에 따라 조인 연산을 수행할 수 있다. 즉, 상기 조인 연산은 상기 조건에 해당하는 복수의 유동 필드 튜플을 결합하여 하나의 유동 필드 튜플을 생성하는 연산을 포함할 수 있다.When the join operator is input through a user terminal, the data storage and retrieval system may perform a join operation according to a condition included in the join operator. That is, the join operation may include an operation of generating one flow field tuple by combining a plurality of flow field tuples corresponding to the condition.

즉, 상기 데이터 저장 및 조회 시스템은 도 4에 도시된 바와 같이 제1 유동 필드 튜플(401)과 제2 유동 필드 튜플(402)에 대해 조인 연산(403)을 수행하여 제3 유동 필드 튜플(404)을 생성할 수 있다. 이때, 제3 유동 필드 튜플(404)에서 확인 할 수 있는 바와 같이 유동 필드들은 발생시간 순서로 정렬될 수 있다.That is, the data storage and retrieval system performs a join operation 403 on the first flow field tuple 401 and the second flow field tuple 402 as shown in FIG. 4 to generate a third flow field tuple 404. ) Can be created. At this time, as can be seen in the third flow field tuple 404, the flow fields may be arranged in order of occurrence time.

데이터 저장 및 조회 시스템은 사용자단말기를 통해 입력된 스플릿 연산자 및 상기 스플릿 연산자가 포함하는 조건에 따라 스플릿 연산을 수행할 수 있다. 이러한 상기 스플릿 연산은 도 4를 통해 설명한 조인 연산과 정반대의 연산으로서 선택된 유동 필드 튜플을 의미있는 단위의 유동 필드 튜플들로 분리하는 연산일 수 있다.The data storage and retrieval system may perform a split operation according to a split operator input through a user terminal and a condition included in the split operator. The split operation may be an operation of separating the flow field tuple selected as the operation opposite to the join operation described with reference to FIG. 4 into flow field tuples of a meaningful unit.

일반적으로 웹로그 분석을 할 때 의미있는 사용자의 방문은 30분 단위로 인식된다. 즉, 상기 사용자의 임의의 행위가 30분 동안 발생하지 않는 경우, 상기 사용자의 임의의 행위가 일단락 되는 것으로 분석하는 것이 일반적이다. 따라서, 하루 단위로 상기 유동 필드 튜플이 구성되어 있더라도, 30분 단위로 상기 유동 필드 튜플을 분리하는 작업이 필요할 수 있고, 이를 위해 상기 데이터 저장 및 조회 시스템은 상기 스플릿 연산을 수행할 필요가 있다.In general, visits from meaningful users are counted every 30 minutes. That is, if any action of the user does not occur for 30 minutes, it is common to analyze that any action of the user is lost. Therefore, even if the flow field tuple is configured on a daily basis, it may be necessary to separate the flow field tuple on a 30-minute basis, and the data storage and inquiry system needs to perform the split operation.

즉, 상기 데이터 저장 및 조회 시스템은 도 5에 도시된 바와 같이 유동 필드들이 발생시간 순서로 정렬된 제1 유동 필드 튜플(501)에 대한 스플릿 연산(502)을 수행할 수 있다. 도 5의 일례는 30분 단위로 제1 유동 필드 튜플(501)을 분리한 것으로 제1 유동 필드 튜플(501)은 복수의 유동 필드 튜플들(503)로 분리된다. 이러한 시간 단위는 상기 스플릿 연산자에 상기 조건으로서 포함될 수 있다. 또한, 상기 조건은 제1 유동 필드 튜플(501)을 선택하는 내용을 포함할 수 있다.That is, the data storage and retrieval system may perform a split operation 502 on the first flow field tuple 501 in which the flow fields are arranged in order of occurrence time, as shown in FIG. 5. 5, the first flow field tuple 501 is divided into a plurality of flow field tuples 503. The first flow field tuple 501 is separated by 30 minutes. This unit of time may be included as the condition in the split operator. In addition, the condition may include selecting the first flow field tuple 501.

위에서 설명한 데이터 연산자 중 마지막으로 상기 선택 및 프로젝트 연산자 는 상기 유동 필드 관계 데이터에서 특정 패턴을 찾고, 상기 특정 패턴 내에서의 값을 추출하기 위한 것으로서, 상기 선택 및 프로젝트 연산자에 대한 문법으로는 일반적인 정규식(regular expression)을 이용할 수 있다.Lastly, among the data operators described above, the selection and project operators are for finding a specific pattern in the floating field relation data and extracting a value within the specific pattern. The syntax for the selection and project operators is a general regular expression ( regular expression).

예를 들어, 사용자단말기를 통해 <id=Kim>(<node=(\w*)>)*와 같은 형식으로 입력된 상기 선택 및 프로젝트 연산자에 대해 상기 데이터 저장 및 조회 시스템은 사용자 로그인 식별자로서 'Kim'을 사용하는 사용자가 방문한 모든 노드를 찾아 상기 모든 노드에 해당하는 실제값을 추출하여 상기 사용자단말기로 제공할 수 있다.For example, for the selection and project operators entered in the form of <id = Kim> (<node = (\ w *)>) * via a user terminal, the data storage and inquiry system is a user login identifier as' It is possible to find all nodes visited by a user using Kim ', extract the actual values corresponding to all the nodes, and provide them to the user terminal.

또 다른 예로 <id=Lee><node=홈>(<node=(\w*)>)*와 같은 선택 및 프로젝트 연산자가 입력된 경우, 상기 데이터 저장 및 조회 시스템은 상기 사용자 로그인 식별자로서 'Lee'를 사용하는 사용자가 '홈'을 방문한 뒤 바로 방문한 모든 노드를 찾아 상기 모든 노드에 해당하는 실제값을 추출하여 제공할 수 있다. 이때, 상기 노드는 웹페이지를 의미할 수 있다.As another example, when a selection and project operator such as <id = Lee> <node = home> (<node = (\ w *)>) * is inputted, the data storage and inquiry system may use 'Lee' as the user login identifier. A user using 'finds' all nodes visited immediately after visiting 'home', and extracts and provides actual values corresponding to all the nodes. In this case, the node may mean a web page.

이와 같이, 본 발명에 따르면, 상기 웹로그를 필드 이름 및 필드 값 쌍의 집합인 유동 필드, 상기 유동 필드의 시계열적 나열인 유동 필드 튜플, 상기 유동 필드 튜플의 집합인 유동 필드 관계 데이터로 구성하여 저장 및 조회함으로써 상기 웹로그의 시계열 분석을 쉽고 간편하게 수행할 수 있다.As described above, according to the present invention, the weblog consists of a flow field, which is a set of field name and field value pairs, a flow field tuple, which is a time series listing of the flow field, and flow field relationship data, which is a set of the flow field tuples. By storing and querying, time series analysis of the weblog can be easily and conveniently performed.

단계(S601)에서 데이터를 저장 및 조회하는 데이터 저장 및 조회 시스템은 데이터를 식별자별로 분류하고, 동일한 식별자에 대해 상기 데이터를 상기 데이터 의 발생시간 순서로 정렬하여 유동 필드 관계 데이터를 생성한다. 이때, 상기 유동 필드 관계 데이터는 적어도 하나의 유동 필드 튜플을 포함할 수 있고, 상기 유동 필드 튜플은 필드이름 및 필드값의 쌍에 대한 집합인 유동 필드를 상기 발생시간 순서로 정렬하여 포함할 수 있다. 이 경우, 상기 필드이름은 상기 식별자에 대응하는 사용자의 동작 또는 상태를 정의할 수 있고, 상기 필드값은 상기 동작 또는 상기 상태에 대응하는 실제값을 포함할 수 있다.The data storage and retrieval system for storing and retrieving data at step S601 classifies the data by identifier, and sorts the data in the order of occurrence time of the data for the same identifier to generate the flow field relationship data. In this case, the flow field relationship data may include at least one flow field tuple, and the flow field tuple may include a flow field, which is a set of pairs of field names and field values, in the order of occurrence time. . In this case, the field name may define an operation or state of a user corresponding to the identifier, and the field value may include an actual value corresponding to the operation or state.

이러한 상기 데이터는 시계열 분석이 요구되는 데이터를 모두 포함할 수 있다. 즉, 시계열 분석이 필요한 데이터에 동일한 식별자를 부여하고, 동일한 식별자를 갖는 데이터는 상기 발생시간 순서로 정렬함으로써 상기 유동 필드 관계 데이터를 생성할 수 있다. 예를 들어, 상기 데이터는 웹로그를 파싱하여 생성될 수 있고, 상기 식별자는 상기 웹로그에 포함된 사용자 로그인 식별자를 포함할 수 있다. 이때, 하나의 데이터는 상기 유동 필드로서 상기 필드이름 및 상기 필드값을 갖고, 동일한 식별자를 갖는 데이터들은 상기 유동 필드 튜플로서 구성될 수 있다.Such data may include all data for which time series analysis is required. That is, the flow field relationship data may be generated by assigning the same identifier to the data requiring time series analysis and sorting the data having the same identifier in the order of occurrence time. For example, the data may be generated by parsing a weblog, and the identifier may include a user login identifier included in the weblog. In this case, one data may have the field name and the field value as the floating field, and data having the same identifier may be configured as the floating field tuple.

단계(S602)에서 상기 데이터 저장 및 조회 시스템은 상기 유동 필드 관계 데이터를 유동 필드 관계 데이터베이스에 저장 및 유지한다. 이러한 상기 유동 필드 관계 데이터는 이후 설명될 데이터 연산자에 따라 변경되어 다시 유동 필드 관계 데이터베이스에 저장되거나 또는 해당하는 실제값을 검색하여 추출하기 위해 이용될 수 있다.In step S602, the data storage and retrieval system stores and maintains the flow field relationship data in a flow field relationship database. The flow field relationship data may be changed according to a data operator to be described later and stored again in the flow field relationship database or used to retrieve and extract a corresponding actual value.

단계(S603)에서 상기 데이터 저장 및 조회 시스템은 사용자단말기를 통해 입력된 데이터 연산자에 따라 상기 유동 필드 관계 데이터를 변경하거나 상기 유동 필드 관계 데이터의 값을 추출한다. 여기서, 상기 값은 복수의 유동 필드가 각각 포함하는 실제값들의 집합을 의미할 수 있고, 상기 데이터 연산자는 조인 연산자, 스플릿 연산자 및 선택 및 프로젝트 연산자 중 적어도 하나를 포함할 수 있다.In step S603, the data storage and retrieval system changes the flow field relation data or extracts the value of the flow field relation data according to a data operator input through a user terminal. Here, the value may mean a set of actual values included in each of the plurality of floating fields, and the data operator may include at least one of a join operator, a split operator, and a selection and project operator.

이러한, 조인 연산자는 상기 유동 필드 관계 데이터를 변경하는 조인 연산에 대응할 수 있고, 상기 조인 연산은 상기 조인 연산자가 지정하는 동일한 식별자의 서로 다른 유동 필드 튜플을 결합하여 하나의 유동 필드 튜플을 생성하는 연산일 수 있다. 즉, 동일한 식별자를 포함하는 유동 필드 튜플들은 상기 조인 연산에 의해 결합될 수 있다.The join operator may correspond to a join operation for changing the floating field relationship data, and the join operation combines different floating field tuples of the same identifier specified by the join operator to generate one floating field tuple. Can be. That is, floating field tuples containing the same identifier can be combined by the join operation.

또한, 상기 스플릿 연산자는 상기 유동 필드 관계 데이터를 변경하는 스플릿 연산에 대응할 수 있고, 스플릿 연산은 스플릿 연산자가 지정하는 하나의 유동 필드 튜플을 상기 스플릿 연산자가 포함하는 시간 단위에 따라 복수의 유동 필드 튜플로 분리하는 연산일 수 있다. 다시 말해, 하나의 유동 필드 튜플은 상기 시간 단위에 따라 동일한 식별자를 갖는 복수의 유동 필드 튜플로 상기 스플릿 연산에 의해 분리될 수 있다.In addition, the split operator may correspond to a split operation for changing the floating field relationship data, and the split operation may include a plurality of floating field tuples according to a time unit in which the split operator includes one floating field tuple designated by the split operator. Operation can be separated by. In other words, one floating field tuple may be separated by the split operation into a plurality of floating field tuples having the same identifier according to the time unit.

마지막으로, 상기 선택 및 프로젝트 연산자는 상기 유동 필드 관계 데이터의 값을 추출하는 선택 및 프로젝트 연산에 대응할 수 있고, 상기 선택 및 프로젝트 연산은 상기 유동 필드 관계 데이터베이스에서 상기 선택 및 프로젝트 연산자가 포함하는 조건에 따른 특정 패턴을 검색하여 상기 검색된 특정 패턴내의 값을 추출하는 연산일 수 있다. 이러한 상기 선택 및 프로젝트 연산자는 일반적인 정규식을 이용할 수 있다.Finally, the selection and project operator may correspond to a selection and project operation for extracting the value of the flow field relationship data, wherein the selection and project operation is based on a condition included in the selection and project operator in the flow field relationship database. It may be an operation of extracting a value in the searched specific pattern by searching for a specific pattern according to the search. Such selection and project operators can use common regular expressions.

여기서, 상기 데이터 연산자는 이러한 상기 조인 연산자, 상기 스플릿 연산자 및 상기 선택 및 프로젝트 연산자 중 복수개의 연산자를 포함하는 것도 가능하다. 즉, 하나의 유동 필드 튜플을 복수개의 유동 필드 튜플로 분리한 후 값을 추출하거나 또는 복수개의 유동 필드 튜플을 하나의 유동 필드 튜플로 결합한 후 값을 추출하는 것 또한 가능하다.Here, the data operator may include a plurality of operators among the join operator, the split operator, and the selection and project operators. That is, it is also possible to extract a value after separating one flow field tuple into a plurality of flow field tuples or to extract a value after combining a plurality of flow field tuples into one flow field tuple.

즉, 본 발명에 따르면 웹로그를 필드 이름 및 필드 값 쌍의 집합인 유동 필드, 상기 유동 필드의 시계열적 나열인 유동 필드 튜플, 상기 유동 필드 튜플의 집합인 유동 필드 관계 데이터로 구성하여 저장 및 조회함으로써 상기 웹로그의 시계열 분석을 쉽고 간편하게 수행할 수 있을 뿐만 아니라 시계열 분석을 요구하는 모든 데이터에 대해 상기 유동 필드 관계 데이터를 생성하여 저장 및 조회를 수행할 수 있는 데이터 모델을 제공할 수 있다.That is, according to the present invention, the weblog is stored and viewed by configuring a flow field as a set of field name and field value pairs, a flow field tuple as a time series listing of the flow field, and a flow field relation data as a set of the flow field tuples. By doing so, it is possible to easily and simply perform time series analysis of the weblog, and provide a data model for generating, storing, and retrieving the flow field relationship data for all data requiring time series analysis.

도 7은 본 발명의 제3 실시예에 있어서, 데이터 저장 및 조회 시스템의 내부 구성을 설명하기 위한 블록도이다. 도 7에 도시된 바와 같이 데이터 저장 및 조회 시스템(700)은 유동 필드 관계 데이터 생성부(701), 유동 필드 관계 데이터베이스(702) 및 데이터 연산자 처리부(703)를 포함한다.7 is a block diagram for explaining an internal configuration of a data storage and inquiry system according to a third embodiment of the present invention. As shown in FIG. 7, the data storage and retrieval system 700 includes a flow field relationship data generator 701, a flow field relationship database 702, and a data operator processor 703.

유동 필드 관계 데이터 생성부(701)는 데이터를 식별자별로 분류하고, 동일한 식별자에 대해 상기 데이터를 상기 데이터의 발생 시간순으로 정렬하여 유동 필드 관계 데이터를 생성한다. 이때, 상기 유동 필드 관계 데이터는 적어도 하나의 유동 필드 튜플을 포함할 수 있고, 상기 유동 필드 튜플은 필드이름 및 필드값의 쌍에 대한 집합인 유동 필드를 시간순으로 정렬하여 포함할 수 있다. 또한, 상기 필드이름은 상기 식별자에 대응하는 사용자의 동작 또는 상태를 정의할 수 있고, 상기 필드값은 상기 동작 또는 상기 상태에 대응하는 실제값을 포함할 수 있다.The flow field relationship data generation unit 701 classifies the data by identifier, and generates the flow field relationship data by sorting the data in the order of occurrence time of the data for the same identifier. In this case, the flow field relationship data may include at least one flow field tuple, and the flow field tuple may include a flow field, which is a set of pairs of field names and field values, in a chronological order. In addition, the field name may define an operation or state of a user corresponding to the identifier, and the field value may include an actual value corresponding to the operation or state.

유동 필드 관계 데이터베이스(702)는 상기 유동 필드 관계 데이터를 저장 및 유지한다.The fluid field relationship database 702 stores and maintains the fluid field relationship data.

데이터 연산자 처리부(703)는 사용자단말기를 통해 입력된 데이터 연산자에 따라 상기 유동 필드 관계 데이터를 변경하거나 상기 유동 필드 관계 데이터의 값을 추출한다. 여기서, 상기 값은 복수의 유동 필드가 각각 포함하는 실제값들의 집합을 의미할 수 있고, 상기 데이터 연산자는 조인 연산자, 스플릿 연산자 및 선택 및 프로젝트 연산자 중 적어도 하나를 포함할 수 있다.The data operator processor 703 changes the flow field relation data or extracts the value of the flow field relation data according to a data operator input through a user terminal. Here, the value may mean a set of actual values included in each of the plurality of floating fields, and the data operator may include at least one of a join operator, a split operator, and a selection and project operator.

여기서, 상기 조인 연산자는 상기 유동 필드 관계 데이터를 변경하는 조인 연산에 대응할 수 있고, 상기 조인 연산은 상기 조인 연산자가 지정하는 동일한 식별자의 서로 다른 유동 필드 튜플을 결합하여 하나의 유동 필드 튜플을 생성하는 연산일 수 있다. 즉, 동일한 식별자를 포함하는 유동 필드 튜플들은 상기 조인 연산에 의해 결합될 수 있다.Here, the join operator may correspond to a join operation for changing the floating field relationship data, and the join operation combines different floating field tuples of the same identifier designated by the join operator to generate one floating field tuple. It can be an operation. That is, floating field tuples containing the same identifier can be combined by the join operation.

또한, 상기 스플릿 연산자는 상기 유동 필드 관계 데이터를 변경하는 스플릿 연산에 대응할 수 있고, 상기 스플릿 연산은 스플릿 연산자가 지정하는 하나의 유동 필드 튜플을 상기 스플릿 연산자가 포함하는 시간 단위에 따라 복수의 유동 필드 튜플로 분리하는 연산일 수 있다. 다시 말해, 하나의 유동 필드 튜플은 상기 시간 단위에 따라 동일한 식별자를 갖는 복수의 유동 필드 튜플로 상기 스플릿 연산에 의해 분리될 수 있다.The split operator may correspond to a split operation for changing the flow field relationship data, and the split operation may include a plurality of flow fields according to a time unit in which the split operator includes one flow field tuple designated by the split operator. It can be an operation that separates into tuples. In other words, one floating field tuple may be separated by the split operation into a plurality of floating field tuples having the same identifier according to the time unit.

본 발명에 따른 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(Floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention can be implemented in the form of program instructions that can be executed by various computer means can be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.In the present invention as described above has been described by the specific embodiments, such as specific components and limited embodiments and drawings, but this is provided to help a more general understanding of the present invention, the present invention is not limited to the above embodiments. For those skilled in the art, various modifications and variations are possible from these descriptions.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the described embodiments, and all the things that are equivalent to or equivalent to the claims as well as the following claims will belong to the scope of the present invention. .

본 발명에 따르면, 웹로그를 필드 이름 및 필드 값 쌍의 집합인 유동 필드(floating field), 상기 유동 필드의 시계열적 나열인 유동 필드 튜플(floating field tuple), 상기 유동 필드 튜플의 집합인 유동 필드 관계 데이터(floating field relation data)로 구성하여 저장 및 조회함으로써 상기 웹로그의 시계열 분석을 쉽고 간편하게 수행할 수 있다.According to the present invention, a weblog is a floating field that is a set of field name and field value pairs, a floating field tuple that is a time series listing of the flow field, and a flow field that is a collection of the flow field tuples. By configuring and storing and querying the floating field relation data, time series analysis of the weblog can be easily and conveniently performed.

본 발명에 따르면, 상기 웹로그 뿐만 아니라 시계열 분석을 요구하는 모든 데이터에 대해 상기 유동 필드 관계 데이터를 생성하여 저장 및 조회를 수행할 수 있는 데이터 모델을 제공할 수 있다.According to the present invention, it is possible to provide a data model capable of generating, storing, and retrieving the flow field relation data for all data requiring time series analysis as well as the web log.

Claims

In the weblog-based data storage and retrieval method,

A preprocessing step of generating and maintaining floating field relation data based on the weblog and the time of occurrence of the weblog; And

Processing the floating field relationship data according to a data operator input through a user terminal

Data storage and query method comprising a.

The method of claim 1,

The pretreatment step,

Parsing a weblog to extract data from the weblog;

Classifying the data according to a user login identifier included in the weblog; And

Generating the floating field relationship data by sorting the data in the order of occurrence time for the same user login identifier.

Data storage and query method comprising a.

The method of claim 1,

The flow field relationship data includes at least one floating field tuple,

And the floating field tuple is a set of pairs of field names and field values.

The method of claim 3,

The field name defines an operation or state of a user corresponding to a user login identifier,

And the field value includes an actual value corresponding to the operation or the state.

The method of claim 1,

The data operator includes at least one of a join operator, a split operator, and a select-and-project operator.

The step of processing the floating field relationship data according to a data operator input through a user terminal,

(1) combine the flow field tuples contained in the flow field relation data according to the join operator, or (2) separate the flow field tuples into a plurality of flow field tuples according to the split operator, or (3) the selection and A data storage and retrieval method for extracting a value from the flow field relation data according to a project operator and providing the value to the user terminal.

In the data storage and inquiry method,

A preprocessing step of classifying data by identifier and generating floating field relation data by sorting the data in the order of occurrence time of the data for the same identifier.

Data storage and query method comprising a.

The method of claim 6,

The flow field relationship data includes at least one flow field tuple,

And the floating field tuple includes a floating field, which is a set of pairs of field names and field values, in the order of occurrence time.

The method of claim 7, wherein

The field name defines an operation or state of a user corresponding to the identifier,

The method of claim 6,

Storing and maintaining the flow field relationship data in a flow field relationship database; And

Changing the floating field relation data or extracting the value of the floating field relation data according to a data operator input through a user terminal;

Data storage and query method further comprising.

The method of claim 9,

The data operator includes at least one of a join operator, a split operator, and a selection and project operator.

The method of claim 10,

The join operator corresponds to a join operation that changes the floating field relationship data,

And said join operation is an operation of generating one floating field tuple by combining different floating field tuples of the same identifier designated by the join operator.

The method of claim 10,

The split operator corresponds to a split operation for changing the floating field relationship data,

The split operation is an operation of dividing one floating field tuple designated by the split operator into a plurality of floating field tuples according to a time unit included in the split operator.

The method of claim 10,

The selection and project operators correspond to selection and project operations for extracting values of the flow field relationship data,

The selection and project operation is an operation of extracting a value in the searched specific pattern by searching for a specific pattern according to the conditions included in the selection and project operator in the flow field relation database.

The method of claim 6,

The data is generated by parsing a weblog,

And the identifier includes a user login identifier included in the weblog.

A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 14 is recorded.

In the data storage and inquiry system,

A flow field relation data generation unit for classifying data by identifier and generating floating field relation data by sorting the data according to an occurrence time of the data with respect to the same identifier;

A flow field relationship database for storing and maintaining the flow field relationship data; And

A data operator processor for changing the floating field relation data or extracting the value of the floating field relation data according to a data operator input through a user terminal.

Data storage and inquiry system comprising a.

The method of claim 16,

The flow field relationship data includes at least one flow field tuple,

And the floating field tuple includes a floating field, which is a set of pairs of field names and field values, arranged in chronological order.

The method of claim 17,

The method of claim 16,

The method of claim 19,

And said join operation is an operation of combining different floating field tuples of the same identifier designated by the join operator to generate one floating field tuple.

The method of claim 19,

And the split operation is an operation of separating one floating field tuple designated by a split operator into a plurality of floating field tuples according to a time unit included in the split operator.

The method of claim 19,

And the selection and project operation retrieves a specific pattern according to a condition included in the selection and project operator from the flow field relation database and extracts a value within the searched specific pattern.