KR102338653B1

KR102338653B1 - Method of performing distributed grouping processing for each node to minimize shuffling in cluster environment of large data and apparatus thereof

Info

Publication number: KR102338653B1
Application number: KR1020210133617A
Authority: KR
Inventors: 강현숙; 공용식; 김종완; 류승환
Original assignee: 주식회사 이글루시큐리티
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2021-12-14

Abstract

A method performed by a device including a cluster consisting of a plurality of nodes comprises the steps of: running a first application; distributing input data and inputting the distributed input data into a plurality of nodes included in the cluster; generating grouping data by performing aggregation on the input data in the plurality of nodes; outputting the grouping data from the cluster including the plurality of nodes in response to the input data; and storing the grouping data in a queue or a buffer associated with the cluster. The step of performing the aggregation on the input data includes a step of generating the grouping data by grouping the input data based on a specific condition within each of the plurality of nodes. Therefore, the present invention can process data efficiently.

Description

Method of performing distributed grouping processing for each node to minimize shuffling in a cluster environment of large data and a device supporting it

본 발명은 클러스터링 기법을 이용하여 대용량 데이터를 처리하는 것으로, 보다 상세하게는 클러스터 환경에서 대용량 데이터 처리를 위한 메모리 내 그룹화 방법에 관한 것이다.The present invention relates to processing of large-capacity data using a clustering technique, and more particularly, to a grouping method in memory for processing large-capacity data in a cluster environment.

데이터 처리시, 데이터를 서버 별로 분산 처리하였다가 다시 한곳에 묶는 그룹핑 과정을 거치게 되면 셔플링이라는 각 서버간의 데이터 교환 과정이 일어나게 된다. In data processing, when data is distributed and processed for each server and then grouped again, a data exchange process between each server called shuffling occurs.

적은 양의 데이터에 대한 셔플링은 무리가 없으나, 많은 양의 데이터에 대해 끊임없이 셔플링이 발생한다면 상당한 네트워크와 메모리 부하가 발생하게 된다.Shuffling a small amount of data is not unreasonable, but if shuffling continuously occurs for a large amount of data, a significant network and memory load occurs.

위와 같은 문제점을 해결하기 위해 효율적인 클러스터 환경에서의 데이터 처리 방법이 필요하다.In order to solve the above problems, an efficient data processing method in a cluster environment is required.

공개특허공보 제10-2020-0178550호, 2020.12.18Laid-open Patent Publication No. 10-2020-0178550, 2020.12.18

상술한 바와 같은 문제점을 해결하기 위해 본 발명은 각 노드 내 메모리에서 그룹핑(그룹화 작업)을 통하여 단계를 나누어 처리하는 방법을 제공하고자 한다.In order to solve the above problems, the present invention is to provide a method of processing by dividing the steps through grouping (grouping operation) in the memory within each node.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따르면, 복수 개의 노드들로 구성된 클러스터(cluster)를 포함하는 장치가 수행하는 방법을 제공할 수 있다.According to an embodiment of the present invention for solving the above problems, it is possible to provide a method performed by an apparatus including a cluster consisting of a plurality of nodes.

상기 방법은, 제1 애플리케이션을 실행하는 단계; 입력 데이터를 분산하여 상기 클러스터에 포함되는 복수의 노드들 각각에 복수의 부분 데이터로서 입력하는 단계; 상기 복수의 노드들 각각이 상기 복수의 부분 데이터에 대해 어그리게이션을 수행함으로써 그룹핑 데이터를 생성하는 단계; 상기 입력 데이터에 대한 응답으로 상기 복수의 노드들을 포함하는 상기 클러스터로부터 상기 그룹핑 데이터를 출력하는 단계; 및 상기 그룹핑 데이터를 상기 클러스터와 연결된 큐 또는 버퍼에 저장하는 단계를 포함할 수 있다.할 수 있다.The method may include executing a first application; distributing input data and inputting it as a plurality of partial data to each of a plurality of nodes included in the cluster; generating grouping data by each of the plurality of nodes performing aggregation on the plurality of partial data; outputting the grouping data from the cluster including the plurality of nodes in response to the input data; and storing the grouping data in a queue or buffer connected to the cluster.

상기 그룹핑 데이터를 생성하는 단계는, 상기 각각의 복수의 노드들 내에서 특정 조건에 기초하여 상기 복수의 부분 데이터를 어그리게이션하는 단계를 더 포함할 수 있다.The generating of the grouping data may further include aggregating the plurality of partial data based on a specific condition within each of the plurality of nodes.

상기 복수의 부분 데이터를 어그리게이션하는 단계는, 상기 특정 조건에 기초하여 입력된 데이터 중 그룹화가 필요한 데이터 값(value)들을 비교하여 공통되는 값을 갖는 데이터들로 그룹화되는 단계를 포함할 수 있다. The step of aggregating the plurality of partial data may include comparing data values required for grouping among the input data based on the specific condition and grouping them into data having a common value. .

상기 공통되는 값을 갖는 데이터들로 그룹화되는 단계는, 각 데이터의 헤더에서 메타데이터를 확인하는 단계; 및 상기 메타데이터의 속성 값이 동일할 경우, 그룹화 대상 데이터로 판단하는 단계를 포함할 수 있다.The grouping of data having a common value may include: checking metadata in a header of each data; and when the attribute values of the metadata are the same, determining the grouping target data.

상기 공통되는 값을 갖는 데이터들로 그룹화되는 단계는, 상기 메타데이터의 속성이 동일하지 않을 경우, 미리 정해진 유사도 기준에 따라 데이터 생성 주체, 수정 권한 부여 여부, 데이터의 보안 등급, 데이터 처리 내역, 셔플링 수행 여부, 셔플링 수행 횟수에 대한 데이터 유사 여부를 판정하는 단계; 및 유사도 판정에 따른 유사도 스코어가 기준값을 초과하는 경우 상기 복수의 부분 데이터를 그룹화하는 단계를 포함할 수 있다.In the grouping of data having a common value, when the properties of the metadata are not the same, the data generation subject, whether to grant the modification authority, the data security level, data processing details, and shuffle according to a predetermined similarity criterion determining whether or not the ring is performed and whether data is similar to the number of times of performing the shuffling; and grouping the plurality of partial data when the similarity score according to the similarity determination exceeds a reference value.

상기 방법은 제2 애플리케이션을 실행하는 단계; 상기 저장된 그룹핑 데이터를 분산하여 상기 클러스터에 포함되는 복수의 노드들에 입력하는 단계; 및 상기 각각의 복수의 노드들에 입력된 데이터에 대해 처리하는 단계를 더 포함할 수 있다.The method includes executing a second application; distributing the stored grouping data and inputting it to a plurality of nodes included in the cluster; and processing the data input to each of the plurality of nodes.

상기 각각의 복수의 노드들에 입력된 데이터에 대해 처리하는 단계는 상기 복수의 노드들 간 데이터의 교환을 생략하는 단계를 포함할 수 있다.The processing of the data input to each of the plurality of nodes may include omitting the exchange of data between the plurality of nodes.

본 발명의 일 실시예에 따르면, 데이터베이스 및 복수의 노드들을 포함하는 장치가 수행하는 방법에 있어서, 상기 데이터베이스의 구성 환경을 판단하는 단계; 상기 데이터베이스가 관계형 데이터베이스인 경우, 저장 프로시저(procedure)를 선택하는 단계; 및 상기 데이터베이스가 관계형 데이터베이스가 아닌 경우, 상기 복수의 노드들 각각은 데이터를 병합하여 그룹핑 데이터를 생성하는 단계를 포함할 수 있다.According to an embodiment of the present invention, there is provided a method performed by an apparatus including a database and a plurality of nodes, the method comprising: determining a configuration environment of the database; selecting a stored procedure if the database is a relational database; and generating grouping data by merging data of each of the plurality of nodes when the database is not a relational database.

상기 그룹핑 데이터를 생성하는 단계는: 데이터 별 키(key) 값이 서로 상이한 맵핑 데이터를 생성하는 단계; 및 상기 맵핑 데이터를 이용해 데이터를 취합하는 단계를 더 포함할 수 있다.The generating of the grouping data may include: generating mapping data having different key values for each data; and collecting data using the mapping data.

상기 그룹핑 데이터는 상기 노드를 포함하는 클러스터에서 셔플링 없이 분산 처리가 되는 것을 포함할 수 있다.The grouping data may include distributed processing without shuffling in the cluster including the node.

본 발명의 일 실시예에 따르면, 복수 개의 노드들로 구성된 클러스터(cluster)를 포함하는 장치를 제공할 수 있다.According to an embodiment of the present invention, it is possible to provide an apparatus including a cluster consisting of a plurality of nodes.

상기 복수 개의 노드들로 구성된 클러스터(cluster)를 포함하는 장치는 데이터를 저장하도록 구성된 메모리; 및 상기 메모리와 연결된 하나 이상의 프로세서를 포함할 수 있다.An apparatus comprising a cluster consisting of the plurality of nodes includes: a memory configured to store data; and one or more processors connected to the memory.

상기 하나 이상의 프로세서는: 제1 애플리케이션을 실행하고; 입력 데이터를 분산하여 상기 클러스터에 포함되는 복수의 노드들 각각에 복수의 부분 데이터로서 입력하고; 상기 복수의 노드들 각각이 상기 복수의 부분 데이터에 대해 어그리게이션을 수행함으로써 그룹핑 데이터를 생성하고; 상기 입력 데이터에 대한 응답으로 상기 복수의 노드들을 포함하는 상기 클러스터로부터 상기 그룹핑 데이터를 출력하고; 및 상기 그룹핑 데이터를 상기 클러스터와 연결된 큐 또는 버퍼에 저장하도록 구성되고, 상기 프로세서가 상기 그룹핑 데이터를 생성하는 경우, 상기 각각의 복수의 노드들 내에서 특정 조건에 기초하여 상기 복수의 부분 데이터를 어그리게이션 하는 것을 포함하도록 구성될 수 있다.The one or more processors are configured to: execute a first application; distributed input data and input as a plurality of partial data to each of a plurality of nodes included in the cluster; generating grouping data by each of the plurality of nodes performing aggregation on the plurality of partial data; output the grouping data from the cluster including the plurality of nodes in response to the input data; and store the grouping data in a queue or a buffer associated with the cluster, wherein when the processor generates the grouping data, the plurality of partial data is stored in each of the plurality of nodes based on a specific condition. It may be configured to include drawing.

본 발명의 일 실시예에 따르면, 데이터베이스 및 복수의 노드들을 포함하는 장치를 제공할 수 있다.According to an embodiment of the present invention, it is possible to provide an apparatus including a database and a plurality of nodes.

상기 장치는 데이터를 저장하도록 구성된 메모리; 및 상기 메모리와 연결된 하나 이상의 프로세서를 포함할 수 있다.The apparatus includes a memory configured to store data; and one or more processors connected to the memory.

상기 하나 이상의 프로세서는: 상기 데이터베이스의 구성 환경을 판단하고; 상기 데이터베이스가 관계형 데이터베이스인 경우, 저장 프로시저(procedure)를 선택하고; 및 상기 데이터베이스가 관계형 데이터베이스가 아닌 경우, 상기 복수의 노드들 각각은 데이터를 병합하여 그룹핑 데이터를 생성하도록 구성될 수 있다.The one or more processors are configured to: determine a configuration environment of the database; if the database is a relational database, select a stored procedure; and when the database is not a relational database, each of the plurality of nodes may be configured to merge data to generate grouping data.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명에 따르면 클러스터에서 셔플링이 많이 발생하여 성능이 저하되는 문제점을 해결함으로써 데이터를 효율적으로 처리할 수 있다.According to the present invention, data can be efficiently processed by solving a problem in which performance deteriorates due to a large amount of shuffling in the cluster.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명에 대한 이해를 돕기 위해 상세한 설명의 일부로 포함된, 첨부 도면은 다양한 실시예들을 제공하고, 상세한 설명과 함께 다양한 실시예들의 기술적 특징을 설명한다.
도 1은 데이터의 클러스터 분산 처리를 도시한 도면이다.
도 2는 클러스터에서 셔플링이 발생하여 여러 개의 노드들에서 하나의 노드로 데이터가 모이는 것을 도시한 도면이다.
도 3은 본 발명에 따른 클러스터 분산 처리가 수행된 데이터가 셀프 어그리게이션이 되는 과정을 도시한 도면이다.
도 4는 본 발명에 따른 클러스터와 셀프 어그리게이션의 개념도를 도시한 도면이다.
도 5는 본 발명에 따른 데이터 처리의 과정을 도시하는 개념도이다.
도 6은 본 발명에 따른 DB 구성 환경에 따른 데이터 처리 과정의 일 예를 도시한 도면이다.
도 7은 본 발명이 구현될 수 있는 장치의 일 예를 도시한 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description to aid understanding of the present invention, provide various embodiments and, together with the detailed description, explain technical features of the various embodiments.
1 is a diagram showing cluster distribution processing of data.
FIG. 2 is a diagram illustrating data gathering from multiple nodes to one node due to shuffling occurring in a cluster.
3 is a diagram illustrating a process in which data subjected to cluster distribution processing according to the present invention becomes self-aggregation.
4 is a diagram illustrating a conceptual diagram of a cluster and self-aggregation according to the present invention.
5 is a conceptual diagram illustrating a process of data processing according to the present invention.
6 is a diagram illustrating an example of a data processing process according to a DB configuration environment according to the present invention.
7 is a diagram illustrating an example of an apparatus in which the present invention can be implemented.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.Spatially relative terms "below", "beneath", "lower", "above", "upper", etc. It can be used to easily describe the correlation between a component and other components. Spatially relative terms should be understood as terms including different directions of components during use or operation in addition to the directions shown in the drawings. For example, when a component shown in the drawing is turned over, a component described as “beneath” or “beneath” of another component may be placed “above” of the other component. can Accordingly, the exemplary term “below” may include both directions below and above. Components may also be oriented in other orientations, and thus spatially relative terms may be interpreted according to orientation.

이하, 본 발명에 따른 바람직한 실시 형태를 첨부된 도면을 참조하여 상세하게 설명한다. 첨부된 도면과 함께 이하에 개시될 상세한 설명은 다양한 실시예들의 예시적인 실시형태를 설명하고자 하는 것이며, 유일한 실시형태를 나타내고자 하는 것이 아니다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. DETAILED DESCRIPTION The detailed description set forth below in conjunction with the appended drawings is intended to describe exemplary embodiments of various embodiments, and is not intended to represent the only embodiments.

도 1은 데이터의 클러스터 분산 병렬 처리를 도시한 도면이다.1 is a diagram illustrating cluster distributed parallel processing of data.

클러스터(cluster, 또는 컴퓨터 클러스터)란 여러 대의 컴퓨터를 네트워크로 연결하여 하나의 컴퓨터처럼 사용할 수 있도록 하는 개념이다. 컴퓨터 클러스터는 컴퓨터 운영체제, 컴퓨터의 하드웨어, 통계 데이터 등 여러 분야에서 사용된다. 컴퓨터 클러스터의 구성 요소들은 일반적으로 고속의 근거리 통신망으로 연결된다. 클러스터는 일반적으로 단일 컴퓨터보다 더 뛰어난 성능과 안정성을 자랑하며, 단일 컴퓨터보다 훨씬 더 효율적이다.A cluster (or computer cluster) is a concept that connects multiple computers through a network so that they can be used as one computer. Computer clusters are used in various fields such as computer operating systems, computer hardware, and statistical data. The components of a computer cluster are usually connected by a high-speed local area network. Clusters generally boast better performance and reliability than single machines, and are much more efficient than single machines.

도 1에서 각각의 컴퓨팅 장치(예컨대, 컴퓨터)는 하나의 노드(Node)로 표현되었고, 스토리지 시스템(storage system)는 데이터베이스(DB)를 포함하고 있다. 도 1은 인풋(INPUT)이라는 명령을 통해 데이터가 클러스터의 각 노드들에 저장됨으로써 데이터의 클러스터 분산 처리 과정을 도시하고 있다.In FIG. 1 , each computing device (eg, a computer) is represented as one node, and the storage system includes a database (DB). 1 illustrates a cluster distributed processing process of data by storing data in each node of a cluster through a command called INPUT.

한 대의 서버가 처리할 수 있는 데이터(또는 정보)의 양과, 여러 대의 서버가 동시에 처리할 수 있는 양을 비교하면 일반적으로 후자가 더 많을 것이다. 하지만 서버에서 처리되는 모든 작업에 클러스터 환경을 구성해서 적용하는 것이 항상 효율적인 것은 아니다. If you compare the amount of data (or information) that one server can process with the amount that multiple servers can process at the same time, the latter will usually be more. However, it is not always efficient to configure and apply a cluster environment to all tasks processed by the server.

도 2는 클러스터에서 셔플링이 발생하여 여러 개의 노드들에서 하나의 노드로 데이터가 모이는 것을 도시한 도면이다.FIG. 2 is a diagram illustrating data gathering from multiple nodes to one node due to shuffling occurring in a cluster.

셔플링(shuffling)이란 노드 간의 데이터 교환 작업을 의미한다. 예를 들어, DB(database)의 Select SQL 중 'group by' 와 같은 집합 함수(aggregation) 연산이 이에 해당한다. 결국 셔플링은 데이터의 그룹핑(grouping)을 위한 작업이기도 하다. Shuffling refers to data exchange between nodes. For example, an aggregate function (aggregation) operation such as 'group by' in Select SQL of DB (database) corresponds to this. After all, shuffling is also an operation for grouping data.

도 2의 각 노드에 저장된 데이터는 셔플링을 통해 하나의 노드로 모아진다. 구체적으로, 데이터는 각 노드에 개별적으로 입력되고, 셔플링을 통해 하나의 노드로 저장되고, 이는 아웃풋(OUTPUT) 명령을 통해 출력되어 스토리지 시스템으로 들어간다. 예컨대, 도 2를 참조하면, 제1 노드(Node1), 제2 노드(Node2) 및 제N 노드(NodeN)에 각각 분산되어 저장된 데이터(DATA)는 셔플링(SHUFFLLING) 동작을 통해 노드간 데이터 교환을 수행할 수 있고, 그 결과 제1 노트(Node1)로 데이터가 집합(aggregation)될 수 있다.Data stored in each node of FIG. 2 is collected into one node through shuffling. Specifically, data is individually input to each node and stored as one node through shuffling, which is output through an OUTPUT command and enters the storage system. For example, referring to FIG. 2 , data DATA distributed and stored in each of the first node Node1, the second node Node2, and the N-th node NodeN is exchanged between nodes through a SHUFFLLING operation. may be performed, and as a result, data may be aggregated into the first note Node1.

노드는 자신이 가지고 있는 데이터만으로는 연산을 완료할 수 없고, 다른 노드의 데이터를 합해서 연산을 완료한다. 각 노드들의 데이터를 합치기 위해 네트워크를 통해서 노드 간 서로 데이터를 주고받아 하나의 노드에 데이터가 집합될 수 있다. 데이터를 수신한 노드에서는 수신한 데이터를 다시 메모리 내에서 합해야 하는 등 일련의 작업들이 수행될 수 있다.A node cannot complete an operation with only the data it has, but completes the operation by adding the data of other nodes. In order to combine data of each node, data may be aggregated in one node by exchanging data between nodes through a network. A series of operations may be performed at the node receiving the data, such as the need to add the received data back to the memory.

이러한 작업은 여러 개의 노드들에 분산되어 있던 데이터를 다시 하나의 노드에 모아서 연산하는 것으로, 네트워크와 메모리 등 리소스를 많이 소모할 수 있다. 즉 클러스터에서 셔플링이 발생하여 여러 개의 노드에서 하나의 노드로 데이터가 모여 병목 현상(Bottle-neck)이 유발되므로, 컴퓨팅 파워를 효율적으로 사용하지 못하고 클러스터 성능 저하의 원인이 된다.This operation collects and calculates data that has been distributed across multiple nodes in one node again, and may consume a lot of resources such as network and memory. In other words, shuffling occurs in the cluster and data is gathered from multiple nodes to one node, causing a bottle-neck, which prevents efficient use of computing power and causes cluster performance degradation.

따라서 위와 같이 셔플링이 많이 발생하여 성능이 저하되는 문제를 해결하기 위한 방법이 필요하다.Therefore, there is a need for a method to solve the problem of performance degradation due to a lot of shuffling as described above.

도 3은 본 발명에 따른 클러스터 분산 처리가 수행된 데이터가 셀프 어그리게이션이 되는 과정을 도시한 도면이다. 3 is a diagram illustrating a process in which data subjected to cluster distribution processing according to the present invention becomes self-aggregation.

셔플링이 많이 발생하는 경우, 하나의 애플리케이션(application)이 한 번에 모든 작업을 처리하기 어렵다. 왜냐하면 클러스터는 데이터를 분산 병력 처리하는 것이므로, 셔플링이 포함된 데이터간 교환 작업이 끝나야 클러스터에서 다음 작업이 진행되기 때문이다. 이로 인해 발생한 지연은 전체 작업에 영향을 미치게 된다. 셔플링을 최대한 피하기 위해, 각 노드 메모리 내에서 할 수 있는 그룹화 작업을 통하여 단계를 나누어 데이터가 처리되는 것이 훨씬 효율적이다.When shuffling occurs a lot, it is difficult for one application to process all tasks at once. This is because the cluster processes data distributed troop processing, so the next operation in the cluster is performed only after the data exchange operation including shuffling is finished. Any delay caused by this will affect the overall operation. In order to avoid shuffling as much as possible, it is much more efficient to divide the data into steps through grouping operations that can be done within each node's memory.

도 3을 참고하면, 데이터 안에서 로직을 갖고 데이터간 먼저 그룹핑을 한다. 이 그룹핑 된 데이터는 큐(또는 버퍼/데이터 베이스)에 그대로 각각 저장(또는, 스택(STACK)으로 지칭됨)된다. 이러한 과정을 셀프 어그리게이션(self-aggregation)이라 한다. Referring to FIG. 3 , the data is grouped first with logic in the data. This grouped data is stored (or referred to as a STACK) in a queue (or buffer/database) as it is. This process is called self-aggregation.

여기서 큐는 컴퓨터의 기본적인 자료 구조의 한가지로, 먼저 집어넣은 데이터가 먼저 나오는 FIFO(First In First Out)구조로 저장하는 형식을 말한다. 큐에는 클러스터 환경이 아니며 그룹핑된 데이터가 따로 저장되어 있다.Here, the queue is one of the basic data structures of the computer, and refers to a format in which data inserted first is stored in a FIFO (First In First Out) structure. Queue is not a cluster environment, and grouped data is stored separately.

데이터 베이스(database, DB)의 종류로는 RDB(relational database), 메모리 DB 등이 있다. DB 프로시저를 선택하게 되는 경우에는 RDB만 사용되며, 그외 DB의 종류는 노드가 그룹핑된 데이터를 로딩하거나 병합하는 경우 사용된다.Types of databases (databases, DBs) include relational databases (RDBs) and memory DBs. When selecting a DB procedure, only RDB is used, and other DB types are used when a node loads or merges grouped data.

도 4는 본 발명에 따른 클러스터와 셀프 어그리게이션의 개념도를 도시한 도면이다.4 is a diagram illustrating a conceptual diagram of a cluster and self-aggregation according to the present invention.

인풋 명령에 의해 저장 시스템으로부터 제1 클러스터로 데이터가 입력된다. 제1 클러스터에 입력된 데이터는 분산되어 각 노드들에 입력되고, 각 노드들은 자기가 가지고 있는 데이터에 대하여 자체 메모리 내에서 1차적인 그룹핑 작업을 수행한다. Data is input from the storage system to the first cluster by the input command. Data input to the first cluster is distributed and input to each node, and each node performs a primary grouping operation on the data it owns in its own memory.

각 노드별 자체에서 1차 그룹핑 작업을 수행 후, 큐 또는 버퍼에 그룹핑된 데이터를 저장한다.After performing the first grouping operation by each node itself, the grouped data is stored in a queue or buffer.

상기 큐 또는 버퍼에 저장된 1차 그룹핑된 데이터를 읽어와 합하는 머지(merge) 작업을 수행한다.A merge operation of reading and summing the first grouped data stored in the queue or buffer is performed.

합쳐진(merged) 데이터를 제2 클러스터에서 다시 각 노드별로 분산하여 입력한다. 따라서 노드 내 셔플링의 효과가 발생하는 것이지 클러스터 별로 셔플링이 수행되는 것이 아니므로, 클러스터 별로 셔플링이 수행됨에 따른 성능 저하를 방지할 수 있다.The merged data is distributed and inputted to each node again in the second cluster. Therefore, since the effect of shuffling within the node is generated and not shuffling for each cluster, it is possible to prevent performance degradation due to the shuffling for each cluster.

도 5는 본 발명에 따른 데이터 처리의 과정을 도시하는 개념도이다.5 is a conceptual diagram showing a process of data processing according to the present invention.

복수 개의 노드들로 구성된 클스터를 포함하는 장치는 제1 애플리케이션을 실행한다(S501). The device including the cluster consisting of a plurality of nodes executes a first application (S501).

상기 장치는 입력 데이터를 분산하여 클러스터에 포함되어있는 복수의 노드들에 입력한다(S503). 이때, 입력 데이터는 복수의 노드들 각각에 복수의 부분 데이터로서 입력된다.The device distributes input data and inputs it to a plurality of nodes included in the cluster (S503). In this case, the input data is input as a plurality of partial data to each of the plurality of nodes.

입력 데이터에 대한 응답으로 클러스터로부터 출력된 데이터를 그룹화 하여 그룹핑 데이터를 생성한다(S505). 구체적으로, 복수의 노드들 각각이 상기 복수의 부분 데이터에 대해 어그리게이션을 수행함으로써 그룹핑 데이터를 생성한다. 이경우, 각각의 복수의 노드들 각각이 특정 조건에 기초하여 복수의 부분 데이터를 어그리게이션할 수 있다In response to the input data, the data output from the cluster is grouped to generate grouping data (S505). Specifically, each of the plurality of nodes generates grouping data by performing aggregation on the plurality of partial data. In this case, each of a plurality of nodes may aggregate a plurality of partial data based on a specific condition.

복수의 부분 데이터를 어그리게이션하는 단계는, 상기 특정 조건에 기초하여 입력된 데이터 중 그룹화가 필요한 데이터 값(value)들을 비교하여 공통되는 값을 갖는 데이터들로 그룹화하는 단계를 포함할 수 있다.The step of aggregating the plurality of partial data may include comparing data values that need to be grouped among the input data based on the specific condition and grouping them into data having a common value.

공통되는 값을 갖는 데이터들로 그룹화되는 단계는, 각 데이터의 헤더에서 메타데이터를 확인하는 단계를 포함할 수 있다. The grouping into data having a common value may include checking metadata in a header of each data.

메타데이터를 확인한 결과 메타데이터의 속성 값이 동일할 경우, 각 노드는 메타데이터의 속성 값이 동일한 데이터를 그룹화 대상 데이터로 판단한다.As a result of checking the metadata, if the attribute values of the metadata are the same, each node determines data having the same attribute values of the metadata as grouping target data.

메타데이터를 확인한 결과 메타데이터의 속성이 동일하지 않을 경우, 각 노드는 미리 정해진 유사도 기준에 따라 데이터 생성 주체, 수정 권한 부여 여부, 데이터의 보안 등급, 데이터 처리 내역, 셔플링 수행 여부, 셔플링 수행 횟수에 대한 데이터 유사 여부를 판정한다. 그리고 각 노드는 유사도 판정에 따른 유사도 스코어가 기준값을 초과하는 경우, 각 노드는 복수의 부분 데이터를 그룹화한다.As a result of checking the metadata, if the properties of the metadata are not the same, each node determines the subject of data creation, whether to grant the right to modify, the security level of the data, whether to perform data processing, whether to perform shuffling, and whether to perform shuffling according to a predetermined similarity criterion. It is determined whether the data is similar to the number of times. In addition, when a similarity score according to a similarity determination of each node exceeds a reference value, each node groups a plurality of partial data.

입력 데이터에 대한 응답으로 복수의 노드들을 포함하는 클러스터로부터 그룹핑 데이터가 출력되어, 큐 또는 버퍼에 저장된다(S507). 저장된 데이터는 바로 출력될 수 있으며, 또는 제2 애플리케이션이 실행됨으로써 클러스터에 포함된 복수의 노드들에 분산 입력되어 처리될 수도 있다(S509). In response to the input data, grouping data is output from the cluster including a plurality of nodes and stored in a queue or a buffer (S507). The stored data may be output immediately, or may be distributed and inputted to and processed by a plurality of nodes included in the cluster by executing the second application (S509).

상기 과정에 의하면 복수의 노드들 간 셔플링(노드 간 데이터 교환)이 발생하지 않음으로써 데이터의 분산 병렬 처리의 효율이 좋아지는 효과가 있다.According to the above process, since shuffling (data exchange between nodes) does not occur between a plurality of nodes, the efficiency of distributed parallel processing of data is improved.

큐 또는 버퍼에 저장된 그룹핑 데이터(1차 그룹핑 데이터)는 바로 출력되거나 또는 제2 애플리케이션이 실행되면서 처리되는데, 제2 애플리케이션 실행 단계에 대하여는 이하 도 6에서 자세히 설명한다.Grouping data (primary grouping data) stored in the queue or buffer is immediately output or processed while the second application is executed. The second application execution step will be described in detail below with reference to FIG. 6 .

도 6은 본 발명에 따른 DB 구성 환경에 따른 데이터 처리 과정의 일 예를 도시한 도면으로, 상기 도 5의 제2 애플리케이션 실행 단계를 구체적으로 도시한 도면이다.6 is a diagram illustrating an example of a data processing process according to a DB configuration environment according to the present invention, and is a diagram specifically illustrating the second application execution step of FIG. 5 .

도 6은 DB 구성 환경이 관계형 데이터베이스(relational database: RDB)인 경우 또는 DB 구성 환경이 RDB가 아닌 경우(else)로 나누어 도시하고있다.FIG. 6 shows a case in which a DB configuration environment is a relational database (RDB) or a case where the DB configuration environment is not an RDB (else).

DB를 포함하는 장치는 제2 애플리케이션을 실행하고(S601), DB 구성 환경이 RDB인지 여부를 판단한다(S602).The device including the DB executes the second application (S601), and determines whether the DB configuration environment is the RDB (S602).

DB 구성 환경이 RDB가 아닌 경우에는 노드는 1차 그룹핑 데이터를 로딩하고, 데이터를 병합하게 된다(S603). If the DB configuration environment is not RDB, the node loads the primary grouping data and merges the data (S603).

구체적으로, 데이터 병합을 수행하기 위해, 데이터 별 키(key) 값이 서로 상이한 맵핑데이터가 생성된다 (S604). 키(key)가 중복되지 않는 특성을 가진 맵(map) 과 같은 콜렉션(collection)을 이용하면 각 노드의 메모리에 적재되어 있는 데이터가 자체적으로 그룹화 될 수 있다. 여기서 맵 콜렉션은 키(key)와 값(value)으로 구성된 Entry 객체를 저장하는 구조를 가지고 있다. 키 및 값은 모두 객체이며 상기 기재되어 있듯이 키는 중복될 수 없다.Specifically, in order to perform data merging, mapping data having different key values for each data is generated (S604). If a collection such as a map with keys does not overlap, the data loaded in the memory of each node can be grouped by itself. Here, the map collection has a structure that stores an Entry object composed of a key and a value. Both keys and values are objects and, as noted above, keys cannot be duplicated.

데이터는 서로 상이한 맵핑데이터에 기초하여 취합됨으로써 데이터 병합이 이루어진다 (S605). 이경우 각 노드별로 데이터 병합이 이루어짐으로써 셔플링 없는 분산 처리가 가능한 효과가 있다.Data is merged by being collected based on different mapping data (S605). In this case, since data is merged for each node, distributed processing without shuffling is possible.

DB 구성 환경이 RDB인 경우에는 DB 프로시저(procedure)가 수행된다(S606).When the DB configuration environment is RDB, a DB procedure is performed (S606).

RDB는 키와 값들의 간단한 관계를 테이블화 시킨 간단한 원칙의 전산정보 데이터베이스로, 데이터를 테이블 형태로 저장한다. RDB is a simple principle computerized information database that tabulates simple relationships between keys and values, and stores data in the form of tables.

DB 프로시저(또는 저장 프로시저)는 데이터베이스에 대한 일련의 작업을 정리한 절차를 관계형 데이터베이스 관리 시스템에 저장한 것으로, 영구저장모듈(persistent storage module)이라고도 불린다.A DB procedure (or stored procedure) stores a procedure that organizes a series of operations on a database in a relational database management system, also called a persistent storage module.

본 발명에 따르면, 다음과 같은 데이터 처리 과정을 제안하고 있다.According to the present invention, the following data processing process is proposed.

1. 노드 내에서 셔플링이 일어날 데이터에 대해서 별도로 그룹핑 처리를 한다. 각 노드에서 자기가 가지고 있는 데이터에 대해서만 자체 메모리 내에서 1차적인 그룹핑 작업을 한다.1. Separate grouping processing for data to be shuffled within a node. Each node performs primary grouping in its own memory only for the data it has.

2. 다음 단계로 이어질 큐/버퍼를 준비한다. 각 노드별로 1차적인 그룹핑을 완료 후, 1차 그룹핑된 데이터를 큐/버퍼에 저장한다.2. Prepare the queue/buffer for the next step. After completing the primary grouping for each node, the primary grouped data is stored in the queue/buffer.

3-1. 두 번째 애플리케이션에서 큐/버퍼로부터 1차 그룹핑된 데이터를 읽어와 합하는 Merge 작업을 한다. 이 작업은 별도 애플리케이션으로 할 수도 있고, 큐/버퍼를 DB로 이용하면 DB의 'Group By' 기능(집합 함수 연산)을 통해 처리할 수 있다. 3-1. In the second application, read and merge the first grouped data from the queue/buffer. This work can be done as a separate application, or if the queue/buffer is used as a DB, it can be processed through the 'Group By' function (set function operation) of the DB.

3-2. 또는 큐/버퍼에 저장된 1차 그룹핑된 데이터를 출력할 수도 있다.3-2. Alternatively, the primary grouped data stored in the queue/buffer may be output.

4. 합쳐진 데이터를 다시 노드 별로 나누어, 상기 과정을 반복한다.4. Divide the merged data for each node again, and repeat the above process.

각 노드의 자체 메모리 내에서 데이터를 그룹핑 처리하고 그룹핑 된 데이터는 큐/버퍼에에 저장함으로써, 클러스터에서 셔플링이 발생하여 여러 개의 노드에서 하나의 노드로 데이터가 모여 병목 현상(Bottle-neck)이 발생하는 것을 감소시킬 수 있다. 이로 인해 컴퓨팅 파워를 효율적으로 사용할 수 있으며 클러스터 성능 저하도 방지할 수 있다.By grouping data in each node's own memory and storing the grouped data in a queue/buffer, shuffling occurs in the cluster, and data is collected from multiple nodes to one node, thereby reducing the bottleneck. can reduce what happens. This allows efficient use of computing power and prevents cluster performance degradation.

도 7은 본 발명이 구현될 수 있는 장치의 일 예를 도시한 도면이다7 is a diagram illustrating an example of an apparatus in which the present invention can be implemented.

도 7을 참고하면 장치는 입/출력부(310), 통신부(320), 센싱부(330), 데이터베이스(340) 및 프로세서(350)를 포함할 수 있다.Referring to FIG. 7 , the device may include an input/output unit 310 , a communication unit 320 , a sensing unit 330 , a database 340 , and a processor 350 .

입/출력부(310)는 사용자 입력을 받거나 또는 사용자에게 정보를 출력하는 각종 인터페이스나 연결 포트 등일 수 있다. 입/출력부(310)는 입력 모듈과 출력 모듈로 구분될 수 있는데, 입력 모듈은 사용자로부터 사용자 입력을 수신한다. 사용자 입력은 키 입력, 터치 입력, 음성 입력을 비롯한 다양한 형태로 이루어질 수 있다. 이러한 사용자 입력을 받을 수 있는 입력 모듈의 예로는 전통적인 형태의 키패드나 키보드, 마우스는 물론, 사용자의 터치를 감지하는 터치 센서, 음성 신호를 입력 받는 마이크, 영상 인식을 통해 제스처 등을 인식하는 카메라, 사용자 접근을 감지하는 조도 센서나 적외선 센서 등으로 구성되는 근접 센서, 가속도 센서나 자이로 센서 등을 통해 사용자 동작을 인식하는 모션 센서 및 그 외의 다양한 형태의 사용자 입력을 감지하거나 입력 받는 다양한 형태의 입력 수단을 모두 포함하는 포괄적인 개념이다. 여기서, 터치 센서는 디스플레이 패널에 부착되는 터치 패널이나 터치 필름을 통해 터치를 감지하는 압전식 또는 정전식 터치 센서, 광학적인 방식에 의해 터치를 감지하는 광학식 터치 센서 등으로 구현될 수 있다. 이외에도 입력 모듈은 자체적으로 사용자 입력을 감지하는 장치 대신 사용자 입력을 입력 받는 외부의 입력 장치를 연결시키는 입력 인터페이스(USB 포트, PS/2 포트 등)의 형태로 구현될 수도 있다. 또 출력 모듈은 각종 정보를 출력해 사용자에게 이를 제공할 수 있다. 출력 모듈은 영상을 출력하는 디스플레이, 소리를 출력하는 스피커(및/또는 이와 연결된 증폭기(amplifier)), 진동을 발생시키는 햅틱 장치 및 그 외의 다양한 형태의 출력 수단을 모두 포함하는 포괄적인 개념이다. 이외에도 출력 모듈은 상술한 개별 출력 수단을 연결시키는 포트 타입의 출력 인터페이스의 형태로 구현될 수도 있다.The input/output unit 310 may be various interfaces or connection ports that receive user input or output information to the user. The input/output unit 310 may be divided into an input module and an output module, and the input module receives a user input from a user. The user input may be made in various forms including a key input, a touch input, and a voice input. Examples of input modules that can receive such user input include a traditional keypad, keyboard, and mouse, as well as a touch sensor that detects a user's touch, a microphone that receives a voice signal, a camera that recognizes gestures through image recognition, A proximity sensor composed of an illuminance sensor or infrared sensor that detects a user's approach, a motion sensor that recognizes a user's motion through an acceleration sensor or a gyro sensor, and other various types of input means that detect or receive various types of user input It is a comprehensive concept that includes all Here, the touch sensor may be implemented as a piezoelectric or capacitive touch sensor for detecting a touch through a touch panel or a touch film attached to the display panel, an optical touch sensor for detecting a touch by an optical method, and the like. In addition, the input module may be implemented in the form of an input interface (USB port, PS/2 port, etc.) that connects an external input device that receives a user input instead of a device that detects a user input by itself. In addition, the output module can output various information and provide it to the user. The output module is a comprehensive concept that includes a display that outputs an image, a speaker that outputs a sound (and/or an amplifier connected thereto), a haptic device that generates vibration, and other various types of output means. In addition, the output module may be implemented in the form of a port-type output interface for connecting the above-described individual output means.

일 예로, 디스플레이 형태의 출력 모듈은 텍스트, 정지 영상, 동영상을 디스플레이 할 수 있다. 디스플레이는 액정 디스플레이(LCD: Liquid Crystal Display), 발광 다이오드(LED: light emitting diode) 디스플레이, 유기 발광 다이오드(OLED: Organic Light Emitting Diode) 디스플레이, 평판 디스플레이(FPD: Flat Panel Display), 투명 디스플레이(transparent display), 곡면 디스플레이(Curved Display), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 홀로그래픽 디스플레이(holographic display), 프로젝터 및 그 외의 영상 출력 기능을 수행할 수 있는 다양한 형태의 장치를 모두 포함하는 광의의 영상 표시 장치를 의미하는 개념이다. 이러한 디스플레이는 입력 모듈의 터치 센서와 일체로 구성된 터치 디스플레이의 형태일 수도 있다.For example, the display-type output module may display text, still images, and moving images. The display includes a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a flat panel display (FPD), and a transparent display. display), a curved display, a flexible display, a three-dimensional display, a holographic display, a projector, and various types of devices capable of performing other image output functions. It is a concept meaning an image display device in a broad sense that includes all. Such a display may be in the form of a touch display integrally formed with the touch sensor of the input module.

통신부(320)는 외부 기기와 통신할 수 있다. 따라서, 장치(디바이스)는 통신부를 통해 외부 기기와 정보를 송수신할 수 있다. 예를 들어, 장치는 통신부를 이용해 불법 주·정차 경고 시스템에 저장 및 생성된 정보들이 공유되도록 외부 기기와 통신을 수행할 수 있다.The communication unit 320 may communicate with an external device. Accordingly, the device (device) may transmit/receive information to and from an external device through the communication unit. For example, the device may communicate with an external device so that information stored and generated in the illegal parking/stop warning system is shared using the communication unit.

여기서, 통신, 즉 데이터의 송수신은 유선 또는 무선으로 이루어질 수 있다. 이를 위해 통신부는 LAN(Local Area Network)를 통해 인터넷 등에 접속하는 유선 통신 모듈, 이동 통신 기지국을 거쳐 이동 통신 네트워크에 접속하여 데이터를 송수신하는 이동 통신 모듈, 와이파이(Wi-Fi) 같은 WLAN(Wireless Local Area Network) 계열의 통신 방식이나 블루투스(Bluetooth), 직비(Zigbee)와 같은 WPAN(Wireless Personal Area Network) 계열의 통신 방식을 이용하는 근거리 통신 모듈, GPS(Global Positioning System)과 같은 GNSS(Global Navigation Satellite System)을 이용하는 위성 통신 모듈 또는 이들의 조합으로 구성될 수 있다. 통신에 사용되는 무선 통신 기술은 저전력 통신을 위한 NB-IoT(Narrowband Internet of Things) 를 포함할 수 있다. 이때, 예를 들어 NB-IoT 기술은 LPWAN(Low Power Wide Area Network) 기술의 일례일 수 있고, LTE Cat(category) NB1 및/또는 LTE Cat NB2 등의 규격으로 구현될 수 있으며, 상술한 명칭에 한정되는 것은 아니다. 추가적으로 또는 대체적으로, 다양한 실시예들에 따른 무선 기기에서 구현되는 무선 통신 기술은 LTE-M 기술을 기반으로 통신을 수행할 수 있다. 이때, 일 예로, LTE-M 기술은 LPWAN 기술의 일례일 수 있고, eMTC(enhanced Machine Type Communication) 등의 다양한 명칭으로 불릴 수 있다. 예를 들어, LTE-M 기술은 1) LTE CAT 0, 2) LTE Cat M1, 3) LTE Cat M2, 4) LTE non-BL(non-Bandwidth Limited), 5) LTE-MTC, 6) LTE Machine Type Communication, 및/또는 7) LTE M 등의 다양한 규격 중 적어도 어느 하나로 구현될 수 있으며 상술한 명칭에 한정되는 것은 아니다. 추가적으로 또는 대체적으로, 다양한 실시예들에 따른 무선 기기에서 구현되는 무선 통신 기술은 저전력 통신을 고려한 지그비(ZigBee), 블루투스(Bluetooth) 및 저전력 광역 통신망(Low Power Wide Area Network, LPWAN) 중 적어도 어느 하나를 포함할 수 있으며, 상술한 명칭에 한정되는 것은 아니다. 일 예로 ZigBee 기술은 IEEE 802.15.4 등의 다양한 규격을 기반으로 소형/저-파워 디지털 통신에 관련된 PAN(personal area networks)을 생성할 수 있으며, 다양한 명칭으로 불릴 수 있다.Here, communication, that is, transmission and reception of data may be performed by wire or wirelessly. To this end, the communication unit includes a wired communication module that accesses the Internet through a local area network (LAN), a mobile communication module that accesses a mobile communication network through a mobile communication base station and transmits and receives data, and a wireless local area network (WLAN) such as Wi-Fi. A short-distance communication module using an area network communication method or a wireless personal area network (WPAN) communication method such as Bluetooth or Zigbee, or a global navigation satellite system (GNSS) such as GPS (Global Positioning System) ) using a satellite communication module or a combination thereof. A wireless communication technology used for communication may include a Narrowband Internet of Things (NB-IoT) for low-power communication. In this case, for example, NB-IoT technology may be an example of LPWAN (Low Power Wide Area Network) technology, and may be implemented in standards such as LTE Cat (category) NB1 and/or LTE Cat NB2, It is not limited. Additionally or alternatively, a wireless communication technology implemented in a wireless device according to various embodiments may perform communication based on LTE-M technology. In this case, as an example, the LTE-M technology may be an example of an LPWAN technology, and may be called various names such as enhanced machine type communication (eMTC). For example, LTE-M technology is 1) LTE CAT 0, 2) LTE Cat M1, 3) LTE Cat M2, 4) LTE non-BL (non-Bandwidth Limited), 5) LTE-MTC, 6) LTE Machine Type Communication, and/or 7) may be implemented in at least one of various standards such as LTE M, and is not limited to the above-described name. Additionally or alternatively, a wireless communication technology implemented in a wireless device according to various embodiments may include at least one of ZigBee, Bluetooth, and Low Power Wide Area Network (LPWAN) in consideration of low power communication. may include, and is not limited to the above-mentioned names. For example, the ZigBee technology can create PAN (personal area networks) related to small/low-power digital communication based on various standards such as IEEE 802.15.4, and can be called by various names.

식별부(330)는 영상 인식을 통해 오브젝트 등을 인식하는 카메라, 오브젝트 접근을 감지하는 감지 센서 및 그 외의 다양한 형태의 외부 입력을 감지하거나 입력 받는 다양한 형태의 식별/센싱 수단을 모두 포함하는 포괄적인 개념일 수 있다. 식별부는 입/출력부(310) 내의 입력 모듈과 동일한 것으로 이해될 수 있거나 및/또는 입력 모듈과는 별도의 것으로 이해될 수도 있다. 식별부(330)는 지자기 센서(Magnetic sensor), 가속도 센서(Acceleration sensor), 온/습도 센서, 적외선 센서, 자이로스코프 센서, 위치 센서(예컨대, GPS), 기압 센서, 근접 센서, RGB 센서(illuminance sensor), 라이다(radar) 센서, 조도 센서, 및 전류 센서 중 하나 이상을 더 포함할 수 있으나, 이에 한정되는 것은 아니다. 각 센서들의 기능은 그 명칭으로부터 당업자가 직관적으로 추론할 수 있으므로, 구체적인 설명은 생략하기로 한다.The identification unit 330 includes a camera that recognizes an object through image recognition, a detection sensor that detects an object approach, and various types of identification/sensing means that detect or receive various types of external inputs. could be a concept. The identification unit may be understood to be the same as the input module in the input/output unit 310 and/or may be understood to be separate from the input module. The identification unit 330 includes a geomagnetic sensor, an acceleration sensor, a temperature/humidity sensor, an infrared sensor, a gyroscope sensor, a location sensor (eg, GPS), a barometric pressure sensor, a proximity sensor, and an RGB sensor (illuminance). sensor), a lidar sensor, an illuminance sensor, and a current sensor may further include, but is not limited thereto. Since a function of each sensor can be intuitively inferred from the name of a person skilled in the art, a detailed description thereof will be omitted.

데이터베이스(340)는 각종 정보를 저장할 수 있다. 데이터베이스는 데이터를 임시적으로 또는 반영구적으로 저장할 수 있다. 예를 들어, 데이터베이스에는 제1 디바이스 및/또는 제2 디바이스를 구동하기 위한 운용 프로그램(OS: Operating System), 웹 사이트를 호스팅하기 위한 데이터나 점자 생성을 위한 프로그램 내지는 애플리케이션(예를 들어, 웹 애플리케이션)에 관한 데이터 등이 저장될 수 있다. 또, 데이터베이스는 상술한 바와 같이 모듈들을 컴퓨터 코드 형태로 저장할 수 있다. The database 340 may store various types of information. A database can store data temporarily or semi-permanently. For example, the database includes an operating program (OS) for driving the first device and/or the second device, a program or application for generating data or Braille for hosting a website (eg, a web application) ) may be stored. In addition, the database may store the modules in the form of computer code as described above.

데이터베이스(340)의 예로는 하드 디스크(HDD: Hard Disk Drive), SSD(Solid State Drive), 플래쉬 메모리(flash memory), 롬(ROM: Read-Only Memory), 램(RAM: Random Access Memory) 등이 있을 수 있다. 이러한 데이터베이스는 내장 타입 또는 탈부착 가능한 타입으로 제공될 수 있다.Examples of the database 340 include a hard disk (HDD), a solid state drive (SSD), a flash memory, a read-only memory (ROM), a random access memory (RAM), and the like. This can be. Such a database may be provided in a built-in type or a detachable type.

프로세서(350)는 장치(디바이스)의 전반적인 동작을 제어한다. 이를 위해 프로세서(350)는 각종 정보의 연산 및 처리를 수행하고 제1 디바이스 및/또는 제2 디바이스의 구성요소들의 동작을 제어할 수 있다. 예를 들어, 프로세서(350)는 대용량 데이터의 클러스터 환경에서 셔플링을 최소화하기 위해 노드별 분산 그룹핑 처리를 수행하는 방법을 위한 프로그램 내지 애플리케이션을 실행시킬 수 있을 것이다. 프로세서(350)는 하드웨어 소프트웨어 또는 이들의 조합에 따라 컴퓨터나 이와 유사한 장치로 구현될 수 있다. 하드웨어적으로 프로세서(350)는 전기적인 신호를 처리하여 제어 기능을 수행하는 전자 회로 형태로 제공될 수 있으며, 소프트웨어적으로는 하드웨어적인 프로세서(240)를 구동시키는 프로그램 형태로 제공될 수 있다. 한편, 이하의 설명에서 특별한 언급이 없는 경우에는 제1 디바이스 및/또는 제2 디바이스의 동작은 프로세서(350)의 제어에 의해 수행되는 것으로 해석될 수 있다. 즉, 대용량 데이터의 클러스터 환경에서 셔플링을 최소화하기 위해 노드별 분산 그룹핑 처리를 수행하는 방법 에 요구되는 모듈들이 실행되는 경우, 모듈들은 프로세서(350)가 제1 디바이스 및/또는 제2 디바이스를 이하의 동작들을 수행하도록 제어하는 것으로 해석될 수 있다.The processor 350 controls the overall operation of the device (device). To this end, the processor 350 may perform calculation and processing of various types of information and may control operations of components of the first device and/or the second device. For example, the processor 350 may execute a program or application for a method of performing distributed grouping processing for each node in order to minimize shuffling in a cluster environment of large data. The processor 350 may be implemented as a computer or a similar device according to hardware software or a combination thereof. In hardware, the processor 350 may be provided in the form of an electronic circuit that processes electrical signals to perform a control function, and in software, it may be provided in the form of a program that drives the processor 240 in hardware. Meanwhile, in the following description, unless otherwise specified, the operations of the first device and/or the second device may be interpreted as being performed under the control of the processor 350 . That is, when the modules required for the method of performing distributed grouping processing for each node in order to minimize shuffling in a cluster environment of large data are executed, the modules are executed by the processor 350 as the first device and/or the second device. It can be interpreted as controlling to perform the operations of

요약하면, 본 발명은 다양한 수단을 통해 구현될 수 있다. 예를 들어, 다양한 실시예들은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다.In summary, the present invention may be implemented through various means. For example, various embodiments may be implemented by hardware, firmware, software, or a combination thereof.

하드웨어에 의한 구현의 경우, 본 발명에 따른 방법은 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 콘트롤러, 마이크로 콘트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.In case of implementation by hardware, the method according to the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays), a processor, a controller, a microcontroller, a microprocessor, and the like.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명에 따른 방법은 이하에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차 또는 함수 등의 형태로 구현될 수 있다. 예를 들어, 소프트웨어 코드는 메모리에 저장되어 프로세서에 의해 구동될 수 있다. 상기 메모리는 상기 프로세서 내부 또는 외부에 위치할 수 있으며, 이미 공지된 다양한 수단에 의해 상기 프로세서와 데이터를 주고 받을 수 있다. 본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.In the case of implementation by firmware or software, the method according to the present invention may be implemented in the form of a module, procedure, or function that performs the functions or operations described below. For example, the software code may be stored in a memory and driven by a processor. The memory may be located inside or outside the processor, and data may be exchanged with the processor by various known means. The steps of a method or algorithm described in relation to an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

본 발명에 따르면, 데이터베이스(340)에 저장된 데이터는 클러스터에 포함된 복수의 노드들 각각에 입력된다. 프로세서(350)는 복수의 노드들 내에서 상기 입력 데이터에 대해 어그리게이션을 수행하여 그룹핑 데이터를 생성하도록 구성된다. 구체적으로, 프로세서(350)는 각각의 복수의 노드들 내에서 특정 조건에 기초하여 입력된 데이터를 그룹화 하여 상기 그룹핑 데이터를 생성함으로써 셀프-어그리게이션을 수행한다. 프로세서(350)는 복수의 노드들 내에서 생성된 그룹핑 데이터를 출력하고, 출력된 그룹핑 데이터는 클러스터에 연결된 데이터베이스(340)에 저장된다. According to the present invention, data stored in the database 340 is input to each of the plurality of nodes included in the cluster. The processor 350 is configured to generate grouping data by performing aggregation on the input data in a plurality of nodes. Specifically, the processor 350 performs self-aggregation by generating the grouping data by grouping input data based on a specific condition in each of the plurality of nodes. The processor 350 outputs grouping data generated in the plurality of nodes, and the output grouping data is stored in the database 340 connected to the cluster.

프로세서(350)는 데이터베이스(340) 구성 환경이 RDB인지 여부를 판단할 수도 있다. 데이터베이스 구성 환경이 RDB가 아닌 경우 데이터베이스(340)의 데이터가 노드로 로딩되고 프로세서(350)는 로딩된 데이터를 병합한다. 데이터베이스(340) 구성 환경이 RDB인 경우에만 프로세서(350)는 DB 프로시저를 수행한다. The processor 350 may determine whether the database 340 configuration environment is an RDB. When the database configuration environment is not the RDB, the data of the database 340 is loaded into the node, and the processor 350 merges the loaded data. Only when the database 340 configuration environment is RDB, the processor 350 executes the DB procedure.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains know that the present invention may be embodied in other specific forms without changing the technical spirit or essential features thereof. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

310: 입/출력부
320: 통신부
330: 식별부
340: 데이터베이스
350: 프로세서310: input / output unit
320: communication unit
330: identification unit
340: database
350: processor

Claims

In the method performed by an apparatus including a cluster consisting of a plurality of nodes,
running a first application;
distributing input data and inputting it as a plurality of partial data to each of a plurality of nodes included in the cluster;
aggregating the plurality of partial data based on a specific condition within each of the plurality of nodes;
generating grouping data based on the aggregation result;
outputting the grouping data from the cluster including the plurality of nodes in response to the input data; and
Storing the grouping data in a queue or buffer associated with the cluster,
The step of aggregating the plurality of partial data comprises:
Comprising the step of grouping data having a common value by comparing the data values (values) required to be grouped among the input data based on the specific condition,
The step of grouping the data having the common value is,
Check the metadata in the header of each data, and group them in different ways depending on whether the attribute values of the metadata are the same,
When the attribute values of the metadata are the same, it is determined as grouping target data,
If the attribute values of the metadata are not the same, whether the data is similar to the data creation subject, whether to grant the modification authority, the security level of the data, data processing details, whether to perform shuffling, and whether to perform the shuffling according to a predetermined similarity criterion and grouping the plurality of partial data when a similarity score according to the similarity determination exceeds a reference value.

delete

According to claim 1,
running a second application;
distributing the stored grouping data and inputting it to a plurality of nodes included in the cluster; and
The method further comprising the step of processing the data input to each of the plurality of nodes.

6. The method of claim 5,
The method, further characterized in that the step of processing the data input to each of the plurality of nodes comprises omitting the exchange of data between the plurality of nodes.

According to claim 1,
determining a configuration environment of a database included in the device;
selecting a stored procedure if the database is a relational database; and
If the database is not a relational database, each of the plurality of nodes merging data further comprising generating grouping data.

8. The method of claim 7,
The step of generating the grouping data includes:
generating mapping data having different key values for each data; and
The method further comprising the step of aggregating data using the mapping data.

9. The method of claim 8,
The grouping data is characterized in that distributed processing without shuffling in the cluster including the node.

In the device comprising a cluster (cluster) consisting of a plurality of nodes,
a memory configured to store data; and
one or more processors coupled to the memory;
The one or more processors include:
run a first application;
distributed input data and input as a plurality of partial data to each of a plurality of nodes included in the cluster;
aggregating the plurality of partial data based on a specific condition within each of the plurality of nodes;
generate grouping data based on the aggregation result;
output the grouping data from the cluster including the plurality of nodes in response to the input data; and
Storing the grouping data in a queue or buffer associated with the cluster,
When the processor aggregates the plurality of partial data, data having a common value is grouped by comparing data values that require grouping among the data input based on the specific condition,
When the processor groups the data having the common value, it checks the metadata in the header of each data, and groups in different ways depending on whether the attribute values of the metadata are the same,
If the attribute values of the metadata are the same, it is determined as grouping target data,
If the attribute values of the metadata are not the same, whether the data is similar to the data creation subject, whether to grant the modification authority, the security level of the data, data processing details, whether to perform shuffling, and whether to perform the shuffling according to a predetermined similarity criterion , and grouping the plurality of partial data when a similarity score according to the similarity determination exceeds a reference value.

11. The method of claim 10,
The processor is
determining a configuration environment of a database included in the device;
if the database is a relational database, select a stored procedure; and
If the database is not a relational database, each of the plurality of nodes is configured to merge data to generate grouping data.