KR101810180B1

KR101810180B1 - Method and apparatus for distributed processing of big data based on user equipment

Info

Publication number: KR101810180B1
Application number: KR1020170061012A
Authority: KR
Inventors: 이치호
Original assignee: 네이버시스템(주); 이치호
Priority date: 2017-05-17
Filing date: 2017-05-17
Publication date: 2017-12-19

Abstract

Disclosed are a method and a device of a big data distribution processing based on a user device. The method of a big data distribution processing based on a user device comprises the following steps of: dividing and loading, by a collection server, collection target data; transmitting, by the collection server, a storage state of the collection target data to a master node; requesting, by the master node, a processing of the collection target data to a plurality of selection data nodes among a plurality of data nodes based on state information of the plurality of data nodes; and distributing and processing, by the plurality of selection data nodes, the collection target data.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method, apparatus,

본 발명은 빅데이터 처리에 관한 것으로서, 보다 상세하게는 사용자 장치 기반의 빅데이터 분산 처리 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a large data processing, and more particularly, to a large data distribution processing method and apparatus based on a user apparatus.

2012년 미국의 리서치 전문업체인 가트너 (Gartner)는 빅데이터를 ‘데이터 양 (volume)이 많고, 데이터 형태가 다양(variety)하고, 데이터 속도(velocity)가 빠른 데이터’로 정의하였다. 위키피디아(wikipedia)에서는 빅데이터를 기존 데이터베이스 관리 도구로서 데이터 수집, 저장, 관리, 분석 영역을 넘어선 대량의 정형 또는 비정형 데이터 집합’으로 정의하였다.Gartner, a US research firm in 2012, defined Big Data as "data with a lot of volume, variety of data formats, and fast data rates." In wikipedia, Big Data is defined as a large amount of regular or unstructured data sets that go beyond data collection, storage, management and analysis as existing database management tools.

모바일 회사인 Tech Spartan에 의하면 2014년 기준 인터넷에서 1분 동안 생산되는 데이터량은 페이스북에서는 600,000건의 로그인이 되고, 트위터에서는 433,000건의 메시지가 트윗 (tweet)되고, 구글에서는 419만건의 검색이 이루어지고, 유튜브에서는 306시간 분량의 비디오가 업로드되고, 애플 아이튠즈에서는 50,200개 앱이 다운로드되고, 아마존에서는 80,000건 거래가 이루어지고, 이메일은 136,319,444개 전송된다고 하였다. 오늘날, 이처럼 대용량 데이터 시대를 맞이하여 IBM, HP, SAP, MS, EMC, 오라클 같은 대형 IT 벤더 회사들은 빅데이터 솔루션을 개발 중에 있다.According to mobile spokesman Tech Spartan, in 2014, the amount of data produced per minute on the Internet is 600,000 logins on Facebook, 433,000 messages tweet on Twitter, and 419,000 searches on Google , 306 hours of video uploads on YouTube, 50,200 apps on Apple iTunes, 80,000 transactions on Amazon, and 136,319,444 emails. Today, in the age of this massive data age, large IT vendors such as IBM, HP, SAP, MS, EMC and Oracle are developing Big Data solutions.

빅데이터 시대가 도래하면서 새로운 처리/분석 패러다임이 요구되고 있고 미국중심으로 통계엔진인 R이 기업용 분석 플랫폼으로 확산되고 있다. R은 이미 구글, 페이스북, 야후, 아마존 등 닷컴 기업의 분석 플랫폼으로 사용 중에 있다. 특히, 오라클, IBM, 테라데이터 등 빅데이터의 고성능 분석을 추구하는 IT업체들은 실시간 분석을 위해 메모리나 데이터베이스에서 직접 분석을 실시하는 인-메모리 분석(in-memory analytics) 혹은 인-데이터베이스 분석 (in-database analytics)을 할 수 있는 고성능 컴퓨팅 (high performance computing; HPC) 시스템에 R을 기본분석 플랫폼으로 채택하고 있다. R은 위와 같이 많은 활용에도 불구하고 확장성 (scalability)이 떨어지는 단점을 갖고 있다.With the advent of the Big Data era, a new processing / analysis paradigm is required, and the statistical engine, R, is centering on the US as an analytical platform for enterprises. R has already been used as an analytics platform for dot-com companies such as Google, Facebook, Yahoo and Amazon. In particular, IT companies seeking high-performance analysis of Big Data, such as Oracle, IBM, and Teradata, can use in-memory analytics or in-database analytics to perform real- (R) as a basic analytical platform for high performance computing (HPC) systems capable of real-time database analysis and database analytics. R has the disadvantage that scalability is low despite the many uses as above.

따라서, R의 기본패키지로는 제한된 데이터 규모에서 만이 처리되고 작동된다.Therefore, only a limited data size is processed and operated with the basic package of R.

R의 사용자들은 대용량 데이터 처리를 위해 여러 가지 방법을 제시하였는데 그 중에서 ff 패키지와 bigmemory 패키지 사용은 기존의 모든 데이터를 메모리에 로딩 후 처리하는 작업방식에서 벗어나 데이터 구조만을 메모리에 로딩하여 처리하는 방식으로 너무 느리다는 단점과 또한 물리적 메모리 확장의 한계로 인해 대용량 데이터 처리에 한계를 갖고 있다.R users have presented various methods for processing large amounts of data. Among them, using ff package and bigmemory package is a way to load and process only the data structure in memory instead of loading all existing data into memory It is too slow and has limitations on large data processing due to limit of physical memory expansion.

Hadoop은 대용량 데이터를 분산처리 할 수 있는 오픈 소스 플랫폼이다. R과 Hadoop의 통합환경으로는 Rhipe과 Rhadoop 등이 있다.Hadoop is an open source platform that can distribute large amounts of data. R and Hadoop integration environments include Rhipe and Rhadoop.

10-2013-016900710-2013-0169007

본 발명의 일 측면은 사용자 장치 기반의 빅데이터 분산 처리 방법을 제공한다.One aspect of the present invention provides a method of processing a large data distribution based on a user device.

본 발명의 다른 측면은 사용자 장치 기반의 빅데이터 분산 처리 방법을 수행하는 장치를 제공한다.Another aspect of the present invention provides an apparatus for performing a method of processing a large data distribution based on a user apparatus.

본 발명의 일 측면에 따른 사용자 장치 기반의 빅데이터 분산 처리 방법은, 수집 서버가 수집 대상 데이터를 분할 적재하는 단계, 상기 수집 서버는 마스터 노드에게 상기 수집 대상 데이터의 저장 상태를 전송하는 단계, 상기 마스터 노드가 복수의 데이터 노드의 상태 정보를 기반으로 상기 수집 대상 데이터의 처리를 상기 복수의 데이터 노드 중 복수의 선택 데이터 노드로 요청하는 단계와 상기 복수의 선택 데이터 노드가 상기 수집 대상 데이터를 분산 처리하는 단계를 포함할 수 있다.A method for processing a large data distribution based on a user device according to an aspect of the present invention includes the steps of: a collection server loading and storing data to be collected; the collection server transmitting a storage state of the collection object data to a master node; The master node requesting the processing of the collection object data to a plurality of selection data nodes among the plurality of data nodes on the basis of the status information of the plurality of data nodes, .

한편, 사용자 장치 기반의 빅데이터 분산 처리 방법은 상기 복수의 데이터 노드는 상기 마스터 노드로 데이터 처리 가능 정보를 전송하는 단계를 더 포함할 수 있다.Meanwhile, in the method of processing large data distributed based on a user device, the plurality of data nodes may further include transmitting data processable information to the master node.

또한, 상기 데이터 처리 가능 정보는 상기 데이터 노드의 통신 상태, CPU(central processing unit) 사용률, 메모리 사용률에 대한 정보를 포함할 수 있다.In addition, the data processable information may include information on a communication state of the data node, a central processing unit (CPU) usage rate, and a memory usage rate.

또한, 상기 선택 데이터 노드는 상기 마스터 노드로부터 수신한 푸시 알림 정보에 포함된 데이터 처리 지시를 기반으로 할당 데이터를 상기 수집 서버로부터 수신하여 처리하고 처리 결과를 서비스 데이터베이스로 전송할 수 있다.The selection data node may receive allocation data from the collection server based on a data processing instruction included in the push notification information received from the master node, process the received allocation data, and transmit the processing result to the service database.

또한, 상기 마스터 노드는 특정 시간까지 상기 할당 데이터가 처리되지 않은 미처리 데이터 노드를 파악하고, 상기 미처리 데이터 노드로 푸시 알림으로 재요청하거나 또는 상기 미처리 데이터 노드의 임무 취소 후 다른 데이터 노드에게 해당 임무를 지시할 수 있다.In addition, the master node may identify an unprocessed data node to which the allocation data has not been processed until a specific time, and may request the other data node to perform the task again after requesting the unprocessed data node as a push notification or canceling the mission of the unprocessed data node You can tell.

본 발명의 또 다른 측면에 따른 사용자 장치 기반의 빅데이터 분산 처리 시스템은 수집 대상 데이터를 분할 적재하는 단계; 마스터 노드에게 상기 수집 대상 데이터의 저장 상태를 전송하도록 구현되는 수집 서버, 복수의 데이터 노드의 상태 정보를 기반으로 상기 수집 대상 데이터의 처리를 상기 복수의 데이터 노드 중 복수의 선택 데이터 노드로 요청하도록 구현되는 마스터 노드와 상기 수집 대상 데이터를 분산 처리하는 상기 복수의 선택 데이터 노드를 포함할 수 있다. According to another aspect of the present invention, there is provided a system for distributing large data based on a user device, comprising: A collection server configured to transmit a storage state of the collection target data to a master node, a processing server configured to request processing of the collection target data to a plurality of selection data nodes of the plurality of data nodes based on status information of the plurality of data nodes And a plurality of selected data nodes for distributing the collection target data.

한편, 상기 복수의 데이터 노드는 상기 마스터 노드로 데이터 처리 가능 정보를 전송하도록 구현될 수 있다.Meanwhile, the plurality of data nodes may be configured to transmit data processable information to the master node.

본 발명의 실시예에 따른 사용자 장치 기반의 빅데이터 분산 처리 방법 및 장치에 따르면, 이미 대중화되고 PC(personal computer) 급 컴퓨팅 능력을 갖춰 나가고 있는 개인용 모바일 스마트 디바이스(스마트폰 등)의 컴퓨팅 자원(CPU(central processing unit), 메모리(mempry) 등)을 공유(sharing) 함으로써 수십 또는 수백대 분량의 빅데이터 처리를 위한 데이터 노드(서버)도입을 대체 할 수 있다. 따라서, 엔터프라이즈 측면에서는 대규모 비용 절감이 가능해질 수 있고, 빅데이터 처리 시스템 도입 진입 장벽이 낮추어질 수 있다.According to the method and apparatus for distributing big data distributed on the basis of the user apparatus according to the embodiment of the present invention, a computing resource (CPU) of a personal mobile smart device (such as a smart phone) which is already popular and is equipped with PC (personal computer) (central processing unit), a memory (mempry), etc.), it is possible to replace the introduction of a data node (server) for processing large data of several tens or hundreds of units. Thus, on the enterprise side, massive cost savings can be achieved, and barriers to entry of big data processing systems can be reduced.

또한 모바일 자원을 기업에게 제공하는 개인의 측면에서는 대부분의 시간을 유휴 상태로 대기 중인 개인의 스마트 디바이스의 자원을 제공하고 그 대가로 기업으로부터 다양한 콘텐츠 서비스를 무상으로 제공받거나, 디바이스 제공 수수료와 같은 금전적 수익을 제공받을 수도 있다. 즉, 대용량 데이터 분산 처리 분야에 있어, 개인과 기업 모두 쌍방향 이익을 도모할 수 있는 또 한가지의 새로운 IT 관련 솔루션이 될 수 있다.In addition, in terms of individuals who provide mobile resources to companies, it is desirable to provide resources of smart devices of individuals who are idle for the majority of the time, receive a variety of content services free of charge from companies, Revenue may be provided. In other words, in the field of large-scale data distribution processing, both individuals and companies can become another new IT-related solution that can provide interactive benefits.

도 1은 본 발명의 실시예에 따른 사용자 장치를 기반으로 한 빅데이터 분산 처리 시스템을 나타낸 개념도이다.
도 2는 본 발명의 실시예에 따른 사용자 장치를 기반으로 한 빅데이터 분산 처리 방법을 나타낸 순서도이다.
도 3은 본 발명의 실시예에 따른 빅데이터 분산 처리 시나리오를 나타낸 개념도이다.
도 4는 본 발명의 실시예에 따른 데이터 분산 처리 방법을 나타낸 개념도이다. 1 is a conceptual diagram illustrating a big data distribution processing system based on a user apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating a method of distributing big data based on a user apparatus according to an embodiment of the present invention.
3 is a conceptual diagram illustrating a big data distribution processing scenario according to an embodiment of the present invention.
4 is a conceptual diagram illustrating a data distribution processing method according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조 부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

외국 IT(information technology) 회사가 개발하여 현재 기업 및 국가 기관에서 주로 사용 중인 HADOOP 빅데이터 분산 저장 및 처리 시스템 또는 이와 유사한 현대의 빅데이터 처리를 위한 솔루션들은 분산 데이터 처리를 위해 많은 노드(서버 컴퓨터)를 해당 기업에서 직접 구매하거나 임대하여 사용하는 방식을 사용한다.The HADOOP Big Data Distributed Storage and Processing System, or similar modern big data processing solutions developed by a foreign IT (Information Technology) company, which is currently used by enterprises and national organizations, has many nodes (server computers) The company directly purchases or leases the method is used.

이러한 방식은 빅데이터 처리를 위한 시스템의 초기 구축 및 유지 보수를 위해 막대한 자금이 투자될 뿐만 아니라, 처리할 데이터가 없어 유휴 상태일 때도 전원 공급 및 장비 공간에 대한 임대로 등이 고정비로 발생하는 등 해당 시스템을 도입하기 위한 다양한 진입 장벽이 존재한다.This method not only has a huge amount of money invested for the initial construction and maintenance of the system for the big data processing, but also the power supply and equipment space are leased to the fixed cost even when idle because there is no data to process There are various entry barriers to introduce the system.

본 발명의 실시예에 따른 사용자 장치 기반의 빅데이터 분산 처리 방법에서는 모바일 스마트 기기를 활용한 빅데이터 처리 방법이 제안된다. 이미 대중화되고 PC(personal computer) 급 컴퓨팅 능력을 갖춰 나가고 있는 개인용 모바일 스마트 디바이스(스마트폰 등)의 컴퓨팅 자원(CPU(central processing unit), 메모리(mempry) 등)을 공유(sharing) 함으로써 수십 또는 수백대 분량의 빅데이터 처리를 위한 데이터 노드(서버)도입을 대체할 수 있다. 따라서, 엔터프라이즈 측면에서는 대규모 비용 절감이 가능해질 수 있고, 빅데이터 처리 시스템 도입 진입 장벽이 낮추어질 수 있다.A big data processing method using a mobile smart device is proposed in a big data distribution processing method based on a user apparatus according to an embodiment of the present invention. By sharing computing resources (CPU (central processing unit), memory (mempry), etc.) of personal mobile smart devices (smart phones, etc.) that are already popular and are equipped with PC (personal computer) It is possible to replace the introduction of a data node (server) for processing a large amount of data. Thus, on the enterprise side, massive cost savings can be achieved, and barriers to entry of big data processing systems can be reduced.

또한 모바일 자원을 기업에게 제공하는 개인의 측면에서는 대부분의 시간을 유휴 상태로 대기 중인 개인의 스마트 디바이스의 자원을 제공하고 그 대가로 기업으로부터 다양한 콘텐츠 서비스를 무상으로 제공받거나, 디바이스 제공 수수료와 같은 금전적 수익을 제공받을 수도 있다.In addition, in terms of individuals who provide mobile resources to companies, it is desirable to provide resources of smart devices of individuals who are idle for the majority of the time, receive a variety of content services free of charge from companies, Revenue may be provided.

본 발명의 실시예에 따른 사용자 장치 기반의 빅데이터 분산 처리 방법이 사용됨으로써 대용량 데이터 분산 처리 분야에 있어, 개인과 기업 모두 쌍방향 이익을 도모할 수 있는 또 한가지의 새로운 IT 관련 솔루션이 되는 것을 그 목표로 한다.The use of the large data distribution processing method based on the user device according to the embodiment of the present invention enables the individual and the corporation to become a new IT related solution that can achieve two- .

도 1은 본 발명의 실시예에 따른 사용자 장치를 기반으로 한 빅데이터 분산 처리 시스템을 나타낸 개념도이다. 1 is a conceptual diagram illustrating a big data distribution processing system based on a user apparatus according to an embodiment of the present invention.

도 1에서는 사용자 장치를 기반으로 빅데이터를 분산 처리하기 위한 빅데이터 분산 처리 시스템이 개시된다. 1, a big data distribution processing system for distributing big data based on a user apparatus is disclosed.

빅데이터 분산 처리 시스템은 데이터 노드(100), 마스터 노드(120), 서비스 데이터베이스(160), 수집 서버를 포함할 수 있다.The big data distribution processing system may include a data node 100, a master node 120, a service database 160, and a collection server.

데이터 노드(100)는 사용자 장치일 수 있다. 사용자 장치는 사용자들이 가지고 다니는 스마트폰과 같은 장치일 수 있다. 데이터 노드(100)는 빅데이터 분산 처리에 동의한 사용자 장치로서 수집 서버(120), 마스터 노드(140)와 통신 가능한 스마트 어플리케이션을 설치하고, 데이터 노드(100)로서 등록된 장치일 수 있다.The data node 100 may be a user device. The user device may be a device such as a smart phone carried by the user. The data node 100 may be a device registered as the data node 100 by installing a smart application capable of communicating with the collection server 120 and the master node 140 as a user device that agrees with the big data distribution process.

데이터 노드(100)는 주기적으로 마스터 노드(140)와 통신하면서 기기 상태를 전송할 수 있고, 마스터 노드(140)는 주기적으로 데이터 노드(100)의 상태 확인 및 관련 정보의 갱신을 수행할 수 있다. The data node 100 may periodically transmit the device status while communicating with the master node 140 and the master node 140 may periodically check the status of the data node 100 and update the related information.

마스터 노드(140)는 수집 서버(120), 데이터 노드(100)를 관리하기 위한 노드일 수 있다. 또한, 마스터 노드(140)는 분석 처리 알고리즘을 관리하고, 분석 처리 스케줄을 관리할 수 있다.The master node 140 may be a node for managing the collection server 120 and the data node 100. Further, the master node 140 can manage the analysis processing algorithm and manage the analysis processing schedule.

마스터 노드(140)는 등록된 데이터 노드(100) 중 자원 사용률이 낮은 노드에게 빅데이터 처리에 필요한 수집 서버 정보, 데이터셋 정보, 처리 알고리즘을 푸시 알림을 전송하여 데이터 처리를 지시할 수 있다. 데이터 노드(100)는 마스터 노드(140)로부터 수신한 푸시 알림 정보에 포함된 데이터 처리 지시를 바탕으로 자신에게 할당된 소량의 데이터를 수집 서버(120)로부터 수신하여 처리하고 결과를 서비스 데이터베이스(160)에 저장할 수 있다.The master node 140 can instruct the data processing of the collection server information, the data set information, and the processing algorithm necessary for the big data processing to the node having a low resource utilization rate among the registered data nodes 100 by push notifications. The data node 100 receives and processes a small amount of data allocated to itself from the collection server 120 based on the data processing instruction included in the push notification information received from the master node 140 and outputs the result to the service database 160 ). &Lt; / RTI >

수집 서버(120)는 내부 대용량 데이터, 외부 수집 대상 데이터를 수집할 수 있다. 수집 서버(120)는 복수개일 수 있다. 내부 대용량 데이터는 다양한 통신 프로토콜(예를 들어, FTP(file transfer protocol)/SFTP(secure FTP), NFS)를 기반으로 수집될 수 있다. 예를 들어, 내부 대용량 데이터는 각종 지적도 데이터, 크롤링된 비정형 데이터, 통신 정보, 카드 사용 정보, 사진 이미지 데이터, 대용량 빅데이터 등을 포함할 수 있다. 외부 수집 대상 데이터는 통신 프로토콜(예를 들어, FTP/SFTP, SMTP, HTTP/HTTPS, OFF LINE 수집)을 기반으로 수집될 수 있다. 예를 들어, 외부 수집 대상 데이터는 실시간 운행 기록 정보, IoT(internet of things) 센서 이벤트 정보, 기타 대용량 데이터 등을 포함할 수 있다.The collection server 120 can collect internal large-capacity data and external collection target data. The collection server 120 may be plural. Internal mass data may be collected based on various communication protocols (e.g., file transfer protocol (FTP) / secure FTP (SFTP), NFS). For example, the internal large-capacity data may include various cadastral data, crawled unstructured data, communication information, card usage information, photographic image data, large-volume big data, and the like. The external collection target data can be collected based on communication protocols (eg FTP / SFTP, SMTP, HTTP / HTTPS, OFF LINE collection). For example, the external collection target data may include real-time driving record information, IoT (internet of things) sensor event information, and other large-capacity data.

수집 서버(120)는 다양한 통신 프로토콜(FTP/HTTP/NFS 등)을 통해 수집 서버(120)로 데이터를 분할 적재 후 마스터 노드(140)에게 저장 상태를 전송할 수 있다.The collection server 120 may transmit the storage state to the master node 140 after the data is divided and loaded into the collection server 120 through various communication protocols (FTP / HTTP / NFS and the like).

서비스 데이터베이스(160)는 마스터 노드(140)로부터 데이터 노드(100)의 데이터 처리 결과를 수신할 수 있고, 수신한 데이터 처리 결과는 통계 분석 자료, 시각화 자료, 의사 결정 자료로서 사용될 수 있다.The service database 160 may receive data processing results of the data node 100 from the master node 140 and the received data processing results may be used as statistical analysis data, visualization data, and decision data.

도 2는 본 발명의 실시예에 따른 사용자 장치를 기반으로 한 빅데이터 분산 처리 방법을 나타낸 순서도이다.2 is a flowchart illustrating a method of distributing big data based on a user apparatus according to an embodiment of the present invention.

도 2에서는 데이터 노드로 설정된 사용자의 사용자 장치를 기반으로 빅데이터를 분산 처리하기 위한 방법이 개시된다. FIG. 2 illustrates a method for distributing large data based on a user apparatus of a user set as a data node.

도 2를 참조하면, 사용자 장치는 빅데이터의 분산 처리를 위한 분산 처리 어플리케이션을 설치할 수 있다. Referring to FIG. 2, the user device may install a distributed processing application for distributed processing of big data.

사용자 장치는 분산 처리 어플리케이션을 설치하고 빅데이터의 분산 처리에 동의하고, 수집 서버, 마스터 노드와 통신할 수 있다(단계 S200). The user device installs the distributed processing application, agrees with the distributed processing of the big data, and can communicate with the collection server and the master node (step S200).

사용자 장치는 분산 처리 어플리케이션을 탑재하고 마스터 노드에 데이터 노드로서 등록될 수 있다. The user device may be loaded with a distributed processing application and registered as a data node in the master node.

수집 대상 데이터가 수집 서버로 분할 적재되고, 수집 서버는 마스터 노드에게 저장 상태를 전송한다(단계 S210).The collection target data is divided and loaded into the collection server, and the collection server transmits the storage status to the master node (step S210).

내부/외부 수집 대상 데이터는 FTP/HTTP/NFS 등 다양한 통신 프로토콜을 기반으로 수집 서버로 데이터를 분할 적재 후 마스터 노드에게 저장 상태 전송할 수 있다(단계 S220).The internal / external collection target data may be stored in a storage state to the master node after the divided data is loaded to the collection server based on various communication protocols such as FTP / HTTP / NFS (step S220).

예를 들어, 내부 대용량 데이터는 각종 지적도 데이터, 크롤링된 비정형 데이터, 통신 정보, 카드 사용 정보, 사진 이미지 데이터, 대용량 빅데이터 등을 포함할 수 있다. 외부 수집 대상 데이터는 통신 프로토콜(예를 들어, FTP/SFTP, SMTP, HTTP/HTTPS, OFF LINE 수집)을 기반으로 수집될 수 있다. 예를 들어, 외부 수집 대상 데이터는 실시간 운행 기록 정보, IoT(internet of things) 센서 이벤트 정보, 기타 대용량 데이터 등을 포함할 수 있다.For example, the internal large-capacity data may include various cadastral data, crawled unstructured data, communication information, card usage information, photographic image data, large-volume big data, and the like. The external collection target data can be collected based on communication protocols (eg FTP / SFTP, SMTP, HTTP / HTTPS, OFF LINE collection). For example, the external collection target data may include real-time driving record information, IoT (internet of things) sensor event information, and other large-capacity data.

데이터 노드는 주기적으로 마스터 노드와 통신하며 기기 상태를 전송하고, 마스터 노드는 주기적으로 데이터 노드의 상태를 확인하고 관련 정보를 갱신할 수 있다(단계 S230).The data node periodically communicates with the master node and transmits the device status, and the master node can periodically check the status of the data node and update related information (step S230).

마스터 노드는 등록된 데이터 노드 중 자원 사용률이 낮은 데이터 노드에게 빅데이터 처리에 필요한 수집 서버 정보, 데이터셋 정보, 처리 알고리즘을 푸시 알림을 전송하여 데이터 처리를 지시할 수 있다(단계 S240).The master node can instruct the data processing by transmitting push notification to the data node having a low resource utilization rate among the registered data nodes, the collection server information, the data set information and the processing algorithm necessary for the big data processing (step S240).

데이터 노드는 마스터 노드로부터 수신한 푸시 알림 정보에 포함된 데이터 처리 지시를 바탕으로 자신에게 할당된 소량의 데이터를 수집 서버로부터 수신하여 처리하고 결과를 서비스 데이터베이스에 저장할 수 있다(단계 S250).Based on the data processing instruction included in the push notification information received from the master node, the data node receives and processes a small amount of data allocated to the data node from the collection server, and stores the result in the service database (step S250).

즉, 사용자 장치 기반의 빅데이터 분산 처리 방법은, 수집 서버가 수집 대상 데이터를 분할 적재하는 단계, 수집 서버는 마스터 노드에게 수집 대상 데이터의 저장 상태를 전송하는 단계, 마스터 노드가 복수의 데이터 노드의 상태 정보를 기반으로 수집 대상 데이터의 처리를 복수의 데이터 노드 중 복수의 선택 데이터 노드로 요청하는 단계, 복수의 선택 데이터 노드가 수집 대상 데이터를 분산 처리하는 단계를 포함할 수 있다.That is, a method of distributing large data on the basis of a user device includes a step in which the collection server divides the collection object data, the collection server transmits the storage state of the collection object data to the master node, Requesting a plurality of selection data nodes among the plurality of data nodes to process the collection object data based on the status information, and distributing the collection object data to the plurality of selection data nodes.

선택 데이터 노드는 마스터 노드로부터 수신한 푸시 알림 정보에 포함된 데이터 처리 지시를 기반으로 할당 데이터를 수집 서버로부터 수신하여 처리하고 처리 결과를 서비스 데이터베이스로 전송할 수 있다.The selection data node can receive the allocation data from the collection server based on the data processing instruction included in the push notification information received from the master node, process the received allocation data, and transmit the processing result to the service database.

마스터 노드는 특정 시간까지 할당 데이터가 처리되지 않은 미처리 데이터 노드를 파악하고, 미처리 데이터 노드로 푸시 알림으로 재요청하거나 또는 미처리 데이터 노드의 임무 취소 후 다른 데이터 노드에게 해당 임무를 지시할 수 있다.The master node can identify unprocessed data nodes that have not been processed for assignment data until a certain time, re-request them as push notifications to the unprocessed data nodes, or indicate the task to another data node after canceling the mission of the unprocessed data node.

도 3은 본 발명의 실시예에 따른 빅데이터 분산 처리 시나리오를 나타낸 개념도이다.3 is a conceptual diagram illustrating a big data distribution processing scenario according to an embodiment of the present invention.

도 3에서는 빅데이터 분산 처리 시나리오가 개시된다. In Fig. 3, a big data distribution processing scenario is disclosed.

도 3을 참조하면, 데이터 노드(사용자 장치)의 준비 단계가 수행될 수 있다(단계 S300).Referring to FIG. 3, a preparation step of a data node (user device) may be performed (step S300).

예를 들어, 복수개의 스마트폰(예를 들어, 100대)에 데이터 자원을 관리하거나 계산하기 위한 분산 처리 어플리케이션이 설치될 수 있다. For example, a distributed processing application for managing or calculating data resources in a plurality of smartphones (e.g., 100) may be installed.

데이터 노드(스마트폰)는 앱 내에 포함된 각종 설정 정보를 기반으로 마스터 노드에게 데이터 처리 가능 정보를 주기적으로 전송할 수 있다. 데이터 처리 가능 정보는 단말기 통신 상태(와이파이, 무선 통신), CPU, 메모리 사용률 등 데이터 처리 가능 여부 상태 정보일 수 있다.The data node (smartphone) can periodically transmit data processable information to the master node based on various setting information included in the app. The data processable information may be data processing availability status information such as terminal communication status (Wi-Fi, wireless communication), CPU, memory utilization rate, and the like.

마스터 노드는 데이터 처리 가능 정보에 포함된 데이터 노드의 상태를 분석하여 가용 데이터 노드 정보, 개수, 용량 등에 대한 정보를 관리할 수 있다. The master node can manage the information on the available data node information, the number and the capacity by analyzing the state of the data node included in the data processable information.

데이터 수집 단계가 수행될 수 있다(단계 S310).A data collection step may be performed (step S310).

예를 들어, 외부에서 웹 크롤링을 통해 수집 서버로 지난 10년간 신문 기사 텍스트 데이터 수집하고 데이터명을 신문 데이터셋(예를 들어, 용량 10GB)으로 가정할 수 있다.For example, from external Web crawls over the last decade to collection servers, textual data from newspapers can be collected and data names can be assumed to be newspaper data sets (eg, 10GB in capacity).

수집 서버는 현재 수집된 신문 데이터 셋을 정상적으로 수신 및 저장했다고 마스터 노드에게 수집 관련 정보 전송할 수 있다. 수집 관련 정보는 수집 서버명, 파일명, 경로, 용량, 수집 경로 등에 대한 정보를 포함할 수 있다.The collection server may transmit collection-related information to the master node that it has normally received and stored the newspaper data set currently collected. The collection-related information may include information about the collection server name, file name, path, capacity, collection path, and the like.

마스터 노드는 메모리에 상주된 데이터 관리 프레임에 해당 내용을 저장하거나 갱신하고 수집 노드 관리 데이터 베이스에도 물리적으로 저장 및 갱신을 수행할 수 있다.The master node may store or update its contents in a data management frame residing in the memory and physically store and update the node in the collection node management database.

분산 처리를 위한 임무 할당 단계가 수행될 수 있다(단계 S320).A task assignment step for distributed processing may be performed (step S320).

예를 들어, 사용자(프로그래머)는 특정 목적에 의한 신속한 데이터 분산 처리를 위해 마스터 노드에게 신문 데이터 셋에서 '대한민국' 단어의 개수를 카운트하라고 지시할 수 있다. 마스터 노드는 현재 관리 중인 데이터 노드(스마트폰)에서 자원을 가용할 수 있다고 판단되는 데이터 노드를 계산할 수 있다. For example, a user (a programmer) can instruct the master node to count the number of 'Korean' words in a newspaper dataset for fast data distribution processing for a specific purpose. The master node can calculate the data node that is determined to be able to use the resource in the currently managed data node (smart phone).

신문 데이터 셋이 저장된 수집 서버 정보와 신문 데이터셋(10G)에서 데이터 노드별로 얼마의 데이터를 할당할지에 대한 데이터 조각 정보, 각 데이터 노드에서 수행할 작업 내용(대한민국 단어 카운트 지시)등을 배열에 포함하여 푸시 알림 콘텐츠에 포함하여 데이터 노드로 전송할 수 있다.Data collection information about how much data is to be allocated per data node in the newspaper data set 10G, and the contents of the work to be performed by each data node (indicating a Korean word count) are included in the array To be included in the push notification content and transmitted to the data node.

빅데이터 분산 처리 단계가 수행될 수 있다(단계 S330).A big data distribution processing step may be performed (step S330).

작업 내용을 푸시 알림으로 수신한 데이터 노드는 푸시 알림에 포함된 데이터 노드의 데이터 처리 할당 정보를 기반으로 수집 서버와 통신하여 자신에게 할당된 신문 데이터 셋의 특정 구간 데이터를 요청할 수 있다.The data node receiving the job content as a push notification can request the specific interval data of the newspaper data set allocated to the data node by communicating with the collection server based on the data processing allocation information of the data node included in the push notification.

수집 서버는 데이터 노드로부터 요청 받은 정보를 바탕으로 신문 데이터 셋의 정보를 분할하여 각 데이터 노드들에게 전송할 수 있다. The collection server may divide the information of the newspaper data set based on the information requested from the data node and transmit the divided information to the respective data nodes.

데이터 노드 100대는 수집 서버로부터 자신의 할당량 100 메가 바이트씩을 분할하여 수신하고 해당 데이터 조각에서 대한민국 단어 카운트 수행할 수 있다.100 data nodes can receive 100 megabyte of their quota from the collection server and perform Korean word count on the corresponding data fragments.

각 데이터 노드는 결과값(예를 들어, 대한민국 : 100 count)을 수행 결과 정보에 저장하고 마스터 노드에게 성공 여부 및 결과 값을 전송할 수 있다.Each data node may store the resultant value (e.g., 100 counts in Korea) in the execution result information and send the success or failure value to the master node.

마스터 노드로부터 결과값을 정상적으로 수신했다는 정보를 수신한 데이터 노드는 자신이 보유한 데이터 조각을 삭제하여 용량을 확보할 수 있다.The data node receiving the information that the result value has been normally received from the master node can secure the capacity by deleting the data piece held by the data node.

마스터 노드는 특정 시간까지 성공 여부 결과가 도착하지 않은 데이터 노드를 파악하고 관리할 수 있다. 예를 들어, 마스터 노드는 푸시 알림으로 재요청하거나 또는 임무 취소 후 다른 노드에게 해당 임무를 지시할 수 있다.The master node can identify and manage the data nodes for which the success or failure result has not arrived by a certain time. For example, the master node can either re-request a push notification, or delegate that task to another node after canceling the mission.

모든 데이터 노드로부터 결과값을 수신했을 경우, 전체 데이터 노드에서 도착한 결과 값(예를 들어, 대한민국 : 10000 count)을 취합할 수 있다. 취합된 결과값을 서비스 데이터베이스 해당 임무 결과에 저장하고 사용자(프로그래머)에게 결과값을 반환할 수 있다.When a result value is received from all the data nodes, a result value (for example, 10000 counts in Korea) that arrives from all the data nodes can be collected. The collected results can be stored in the service database corresponding mission results and returned to the user (programmer).

도 4는 본 발명의 실시예에 따른 데이터 분산 처리 방법을 나타낸 개념도이다. 4 is a conceptual diagram illustrating a data distribution processing method according to an embodiment of the present invention.

도 4에서는 사용자 장치의 빅데이터 분산 처리 여부를 결정하기 위한 방법이 개시된다. FIG. 4 illustrates a method for determining whether or not to process a big data distribution of a user apparatus.

도 4를 참조하면, 사용자 장치는 빅데이터 분산 처리를 위해 자신의 컴퓨팅 자원을 소모해야 한다. 따라서, 사용자 장치는 현재 사용자 장치의 상태가 어떠한지 여부를 판단하여 빅데이터 분산 처리를 위한 컴퓨팅 자원의 대여 여부를 결정할 수 있다.Referring to FIG. 4, the user device must consume its computing resources for the big data distribution processing. Accordingly, the user apparatus can determine whether the current state of the user apparatus is the current state, and determine whether to lend the computing resource for the big data distribution processing.

사용자가 현재 사용자 장치를 통해 임계치 이상의 컴퓨팅 자원을 사용하고 있는 경우가 가정될 수 있다. 예를 들어, 사용자가 일정 주기 내에 임계값 이상의 데이터 통신을 수행하고, 임계치 이상의 사용자 장치의 프로세서를 사용하는 경우가 가정될 수 있다. It can be assumed that the user is currently using more than a threshold computing resource via the user device. For example, it can be assumed that a user performs data communication with a threshold value or higher within a predetermined period, and uses a processor of a user apparatus of a threshold value or more.

이러한 경우, 사용자 장치는 데이터 처리 가능 정보를 전송시 일정 주기 내의 데이터 통신량과 프로세서의 사용량을 고려하여 데이터 처리량을 결정하여 전송할 수 있다.In this case, the user equipment can determine the amount of data throughput in consideration of the amount of data communication and the amount of processor used within a predetermined period when transmitting the data processable information.

사용자 장치는 일정 주기 내에 수행될 사용자 장치의 데이터 사용량/프로세서의 사용량을 예측할 수 있다. 제1 어플리케이션이 사용자 장치에서 실행 중인 경우, 사용자 장치 상에서 제1 어플리케이션이 앞으로 사용할 예측 데이터량, 예측 프로세싱량을 결정할 수 있다. 제1 어플리케이션이 계속적으로 임계 범위 내의 데이터량/일정한 프로세싱량을 사용할 경우, 예측 데이터량(400)/예측 프로세싱량(420)을 기존의 데이터량, 프로세싱량의 평균을 기반으로 일정한 양으로 예측하여 결정할 수 있다. 반대로, 제1 어플리케이션이 변화량이 큰 데이터량/ 프로세싱량을 사용할 경우, 예측 값을 기존의 데이터양, 프로세싱 양의 최대값을 기반으로 예측하여 결정할 수 있다. The user device may estimate the data usage of the user device / the usage of the processor to be performed within a predetermined period. When the first application is running on the user device, the first application may determine the predicted data amount and the predictive processing amount to be used by the first application on the user device. When the first application continuously uses the amount of data / the amount of processing in the threshold range, the predicted amount of data 400 / the amount of predicted processing 420 is predicted in a predetermined amount based on the existing data amount and the average of the processing amount You can decide. Conversely, when the first application uses a data amount / processing amount with a large amount of change, the predicted value can be determined by predicting based on the existing data amount and the maximum value of the processing amount.

사용자 장치는 사용자 장치에서 사용되는 데이터량, 프로세싱량의 변화를 기반으로 데이터 처리 가능 정보의 전송 주기를 결정할 수도 있다. 제1 어플리케이션이 계속적으로 일정한 데이터량/일정한 프로세싱량을 사용할 경우, 데이터 처리 가능 정보의 전송 주기를 상대적으로 길게 설정할 수 있다. 반대로, 제1 어플리케이션이 변화량이 큰 데이터량/프로세싱량을 사용할 경우, 데이터 처리 가능 정보의 전송 주기를 상대적으로 짧게 설정하여 제1 어플리케이션에서 사용되는 데이터량/프로세싱량의 변화를 보다 자주 전송할 수 있다. The user equipment may determine the transmission period of the data processable information based on the amount of data used in the user equipment and the change in the amount of processing. When the first application continuously uses a constant data amount / constant processing amount, the transmission period of the data processable information can be set to be relatively long. On the other hand, when the first application uses a data amount / processing amount with a large amount of change, the transmission period of the data processable information can be set relatively short so that the change in the data amount / processing amount used in the first application can be transmitted more frequently .

또한, 본 발명의 실시예에 따르면, 사용자 별로 데이터/프로세서 사용 특성을 추가적으로 고려하여 데이터 처리 가능 정보가 생성될 수도 있다. 분산 처리 어플리케이션은 사용자의 사용자 장치 사용 특성을 결정하고, 데이터 처리 가능 정보를 생성할 수도 있다. 예를 들어, 특정 사용자의 경우, 사용자 장치 상에서 동시에 다양한 어플리케이션(또는 프로세싱 성능을 많이 사용하는 어플리케이션)을 사용하는 사용 패턴을 가지고 다른 사용자의 경우, 어플리케이션을 거의 사용하지 않고, 사용한다고 해도 일정량 이하의 프로세싱 성능을 사용하는 어플리케이션만 사용할 수 있다. 이러한 사용자의 사용자 장치 사용 특성 정보는 분산 처리 어플리케이션에 기록되어 있을 수 있고, 분산 처리 어플리케이션은 이러한 사용자의 사용 특성을 기반으로 데이터 처리 가능 정보를 생성할 수도 있다.Also, according to an embodiment of the present invention, the data processable information may be generated by further considering data / processor usage characteristics for each user. The distributed processing application may determine the user device usage characteristics of the user and may generate data processable information. For example, in the case of a specific user, a usage pattern of using various applications (or applications using a lot of processing performance) at the same time on the user device may be used for another user, Only applications using processing power can be used. Such user device usage characteristic information of the user may be recorded in the distributed processing application, and the distributed processing application may generate the data processable information based on the usage characteristic of the user.

전술한 방법은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.The above-described methods may be implemented in an application or may be implemented in the form of program instructions that may be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들일 수 있고, 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention and may be those known and used by those skilled in the computer software arts.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for performing the processing according to the present invention, and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

Claims

A method for processing a large data distribution based on a user device,
A step in which the collection server divides and stores the data to be collected;
The collection server transmitting to the master node a storage state of the collection object data;
Requesting, by the master node, processing of the collection target data to a plurality of selected data nodes among the plurality of data nodes based on state information of the plurality of data nodes;
Wherein the plurality of selected data nodes distribute the collection object data,
The user apparatus operating as each of the plurality of data nodes transmits the data processable information in consideration of the predicted data traffic amount and the prediction processor usage amount,
Wherein the predicted data communication amount is a size of data predicted to be used in an application driven by the user apparatus,
Wherein the prediction processor usage is an amount of processor usage predicted to be used in the application,
Wherein the transmission period of the data processable information is set based on a change in the predicted data traffic amount and the prediction processor usage amount.

The method according to claim 1,
Wherein the plurality of data nodes further comprise transmitting data processable information to the master node,
Wherein the data processable information further comprises information about a user's usage characteristics for the user device.

3. The method of claim 2,
Wherein the data processable information includes information on a communication state of the data node, a central processing unit (CPU) usage rate, and a memory usage rate.

The method of claim 3,
Wherein the selection data node receives and processes allocation data from the collection server based on a data processing instruction included in the push notification information received from the master node, and transmits the processing result to the service database.

5. The method of claim 4,
The master node determines an unprocessed data node to which the allocation data has not been processed until a predetermined time, requests the unprocessed data node for a push notification again, or indicates the corresponding task to another data node after canceling the mission of the unprocessed data node &Lt; / RTI >

A big data distribution processing system based on a user device,
Dividing the data to be collected into segments; A collection server configured to send a storage state of the collection target data to a master node;
A master node configured to request processing of the collection target data to a plurality of selected data nodes among the plurality of data nodes based on status information of the plurality of data nodes; And
And a plurality of selection data nodes for distributing the collection target data,
The user apparatus operating as each of the plurality of data nodes transmits the data processable information in consideration of the predicted data traffic amount and the prediction processor usage amount,
Wherein the predicted data communication amount is a size of data predicted to be used in an application driven by the user apparatus,
Wherein the prediction processor usage is an amount of processor usage predicted to be used in the application,
Wherein the transmission period of the data processable information is set based on a change in the predicted data communication amount and the predictive processor usage amount.

The method according to claim 6,
Wherein the plurality of data nodes are implemented to transmit data processable information to the master node,
Wherein the data processable information further includes information on a user's usage characteristic with respect to the user device.

8. The method of claim 7,
Wherein the data processable information includes information on a communication state of the data node, a central processing unit (CPU) usage rate, and a memory usage rate.

9. The method of claim 8,
Wherein the selection data node receives and processes allocation data from the collection server based on a data processing instruction included in the push notification information received from the master node, and transmits the processing result to the service database. Big data distribution processing system.

10. The method of claim 9,
The master node determines an unprocessed data node to which the allocation data has not been processed until a predetermined time, requests the unprocessed data node for a push notification again, or indicates the corresponding task to another data node after canceling the mission of the unprocessed data node And a user data processing unit for processing the data.